Sage Journals: Discover world-class research

Abstract

Information systems (IS) scholarship and practice aim to predict phenomena and outcomes of IS use. These phenomena of IS use are typically set in multi-leveled, dynamic, and complex contexts that lend explanation to the non-positivist tradition in IS research. However, limited methodological options exist to make predictions. In this research, we propose stratified agent-based modeling, a step-by-step approach that enables prediction in non-positivist paradigms. Drawing upon the critical realist philosophy of science, which suggests ontological stratification and assumes open systems, we adopt a retroduction-based explanation formation and agent-based modeling to simulate different potential states of a complex system. The critical step in combining critical realism with agent-based modeling involves identifying and codifying the underlying generative mechanisms (i.e., causal powers) into various components of the agent-based model. We propose four steps toward prediction under the critical realist paradigm: (1) capturing the phenomenon, (2) identifying the generative mechanism, (3) building the agent-based model, and (4) simulating states of the system. We present an exemplar of our proposed approach that investigates the effectiveness of strategies to combat malicious content propagation in social networks.

Keywords

Critical realism agent-based model complex adaptive system prediction generative mechanism social network malicious behavior

Introduction

For centuries, the desire to predict has pervaded nearly all aspects of knowledge discovery. Empirical sociologists like Florian Znaniecki called for data-driven predictions as early as the nineteenth century (Clauset et al., 2017). Organizations today predict to run their operations. For example, retailers predict consumer shopping patterns to anticipate consumer needs (Chaudhuri et al., 2021). Scientists predict to better human life. For example, clinicians predict patient health outcomes to mitigate adverse health events (Ferrario et al., 2023). Even academic researchers, in general, and Information Systems (IS) researchers, in particular, predict research questions that are interesting and impactful. For example, Brynjolfsson et al. (2021) present challenging questions for investigating the economics of digitization and information technology while urging IS researchers to expand their questions, methodologies, and outcomes. Despite this pervasive need for predictions, remarkably few predictions are made. In the modern era of big data, artificial intelligence, and machine learning, positivist paradigms have made a noticeable leap in predicting scientific discoveries. Yet, predictions in non-positivist paradigms are still very restricted.

While positivism allows for the constant conditions and stable assumptions required to make valid predictions, non-positivist paradigms (e.g., critical realism, interpretivism; Bhaskar, 1975; Bourdieu, 1977; Giddens, 1984) restrict the methodological space toward prediction by assuming self-organization, non-determinism, and probabilistic outcomes (Wynn and Williams, 2012). Disciplines such as Management Science and IS have extensively relied on such non-positivist paradigms for various reasons. For example, taking the perspective of socio-technical systems, IS scholars have underscored the importance of studying technology as more than a black box with organizational outcomes (e.g., Orlikowski, 1992). Instead, they have suggested studying the materiality of technologies against the background of and in use by individuals with agency and different interests grouped in organizations in specific (competitive) environments to understand the complex and highly contextualized outcomes of IS use (e.g., Strong et al., 2014; Volkoff et al., 2007). Moreover, outcomes of IS use are often multi-leveled, emergent, and interdependent. In organizational settings, IS are used by individual users for whom they create immediate concrete outcomes which aggregate at higher organizational (e.g., teams, organizational units) or even systemic levels (e.g., healthcare systems; see Strong et al., 2014). Given the inherent complexity of IS research contexts and the interdependency of social settings and technology artifacts, many IS scholars have adopted a critical realist stance.

Critical realism is a philosophy of science that has been used to study IS use in complex and contextualized settings such as information technology (IT) governance (Williams and Karahanna, 2013), IT infrastructure evolution (Henfridsson and Bygstad, 2013), and IT implementation (Lauterbach et al., 2020). Its two basic presumptions––ontological stratification and an open system assumption (Mingers et al., 2013)––have made critical realism particularly suited to study causality in networked and multi-level contexts (Wynn and Williams, 2012). The stratified ontology¹ of critical realism conceptually separates empirically observable events from the causal powers (i.e., generative mechanisms) that cause them (Bhaskar, 1975). Generative mechanisms are causal powers that explain empirical events (Bygstad et al., 2016). Moreover, in this paradigm, the world is viewed as an open system in which events are not deterministic but probabilistic and depending on the action of various mechanisms. Thus, depending upon conditions and the actions of individual agents, the same generative mechanisms may give rise to different empirical patterns (i.e., contingent causality; Sayer, 1992). While the value of the critical realist paradigms is uncontested for studying causality (e.g., Miller, 2015) and the evolution of systems (e.g., Henfridsson and Bygstad, 2013), prediction under its assumptions of non-constant conditions and contingent causality remains a challenge (Wynn and Williams, 2012).

As bottom-up modeling approaches, for example, agent-based modeling, have gained traction in recent years, scholarship has produced the theoretical and methodological groundwork to study complex, self-organizing, constantly adapting social systems (Drazin and Sandelands, 1992; Holland, 1992; Nan, 2011). Agent-based models (ABMs) are conceptualized as systems of individual agents interacting with each other at micro-level according to specific sets of rules (Holland, 1992). These micro-level interactions of agents collaboratively create empirical patterns at a macro-level (Amaral and Uzzi, 2007; Nan, 2011) that can be studied using ABMs. Agent-based modeling has found a wide adoption in IS research to computationally model and simulate a variety of IS use processes (e.g., Haki et al., 2020; Ross et al., 2019).

We aim to leverage agent-based modeling to unlock prediction in non-positivist paradigms because the basic premises of ABMs and critical realism align well. Prior literature provides evidence for this alignment. First, Miller (2015) contends that agent-based modeling and critical realism build on similar conceptualizations of agents, conditions, and events or interactions between agents with the objective to produce process theoretic explanations of phenomena. Second, both Miller (2015) and Valogianni et al. (2023) argue that the emergence is a central element in both ABM and critical realism. This stream argues that macro-level patterns emerge from adaptive micro-level interactions of agents and the possibility of multiple different outcomes (i.e., multifinality). However, little guidance exists on how to isolate the causal powers from the phenomena they generate (i.e., ontological stratification) and then be codify them into the model components. Thus, the goal of this research is to answer the following research question: How can we leverage bottom-up modeling in critical realism to make predictions?

We propose the stratified ABM, a step-by-step approach that guides researchers in uncovering and codifying generative mechanisms of ABMs—namely, agent attributes, behavioral rules, and interactions between agents. The stratified ABM enables scholars to explore predictive research questions within non-positivist paradigms, such as, critical realism, while catering to the open system, dynamic, and adaptive premises of the paradigm. We introduce to bottom-up modeling the conceptual separation between empirical patterns and causal powers that is central to critical realism. The causal powers––or generative mechanisms––govern micro-level interactions as well as emergent macro-level patterns. For this, we outline a four-step approach: (1) the empirical phenomenon at hand is decomposed into component events and theoretically redescribed; (2) the generative mechanism(s) at play are inferred through retroduction; (3) the identified generative mechanism(s) are codified into different ABM components and are then translated into computational parameters and algorithms and implemented in a programming tool; and (4) potential states of the system are simulated, and outcomes predicted. To demonstrate the purposefulness of the stratified ABM, we use it to predict the effectiveness of account removal strategies to fight malicious content propagation in social networks. Malicious content represents a major threat to social networks as it is intentionally designed and spread to influence opinion climate and real-world behaviors of social network users (Vosoughi et al., 2018). Using the stratified ABM, we identify the generative mechanisms of malicious content propagation––preferential attachment and limited information processing. We codify the generative mechanisms as agents’ attention span, the logic of connecting with other users, and posting and reposting behaviors into the components of our ABM which seeks to emulate the real-world Twitter network. Finally, we simulate the outcomes of different levels of account removal and predict the effectiveness of the interventions.

Our proposal of stratified ABM contributes to scholarship in two ways. First, in proposing the stratified ABM, we address the pragmatic need in scholarship and practice to make stable predictions. However, we attend to the requirement to capture dynamism, adaptivity, contextualization, and the multi-level nature in studies of IS use. This need has been recognized in prior research which has deemed ABMs a suitable technique in critical realist studies (Miller, 2015; Valogianni et al., 2023). We expand this line of work by offering practical guidance to make valid predictions without sacrificing or simplifying the complexity of a system under investigation. Therefore, the proposed methodological approach helps overcome a key challenge in non-positivist paradigms.

Second, we contribute to the agent-based modeling literature. Our work expands previous concepts of merging critical realism and agent-based modeling (Miller, 2015). Stratified ABM provides step-by-step guidance to identifying and then codifying generative mechanisms into ABM simulations and thereby fosters maximum phenomenon-centricity. We encourage the use of agent-based modeling to explore and predict research problems that assume a stratified ontology in which causal powers only become empirically manifested through the events they cause.

The remainder of the paper is structured as follows. In the next section, we outline the basic ontological premises of critical realism followed by an introduction to bottom-up modeling and ABM in particular. The following section entails a detailed description of the stratified ABM along with its four steps. Next, we use the stratified ABM to predict the outcomes of account removal strategies on social networks. We conclude the paper with a discussion of the scholarly contributions and limitations of our research.

Foundations

Ontology in critical realism

Critical realism is a philosophy of science that stands in between and is an alternative to empiricism and idealism (cf. Bhaskar, 1975). The empiricist paradigm relies on a realist ontology and employs an objectivist epistemology.² So, causality resides with the constant conjunction of empirically observable events and theoretical perspectives are positivist. The idealist paradigm relies on a constructivist ontology and employs a subjectivist epistemology.³ So, causality is a transitive property of socially constructed realities and meanings and theoretical perspectives are interpretivist. In contrast,⁴ critical realism reconciles a realist ontology with an interpretivist understanding of knowledge (Bygstad et al., 2016). In critical realism, “the world and entities that constitute reality actually exist ‘out there’, independent of human knowledge or our ability to perceive them” (Wynn and Williams, 2012: 790). For example, structural elements, such as generative mechanisms and conditions are viewed as intransitive, that is, independent of human interference, perception, and existence (Bhaskar 1975). In contrast, empirical events are viewed to be dependent upon human perception and enactment which renders empirical observations subjective, socially constructed, and thus fallible (Wynn and Williams, 2012). In other words, causality is viewed as an intransitive property of reality but the empirically observable events following from it and the scientific knowledge about them are shaped by human action (Bhaskar, 1975). In other words, material structure and human agency are carefully separated (Volkoff et al., 2007).

Critical realism has rapidly gained popularity in IS research in recent years due to its ontological presumptions that make it particularly useful to study causality in networked and multi-level contexts and settings. Critical realism is well suited for process theoretic pursuits which seek to understand the causal mechanisms that drive the emergence of system dynamics (Miller, 2015). For example, IS scholars have relied on critical realist inquiry to uncover dynamics in such contexts as IT governance (Williams and Karahanna, 2013), IT implementation (Lauterbach et al., 2020), infrastructure evolution (Bygstad et al., 2016; Henfridsson and Bygstad, 2013), and ICT adoption (Zachariadis et al., 2013). Moreover, critical realism underlies several popular IS theories, such as, representation theory (Weber, 1987), affordance theory (relational view; Volkoff and Strong, 2013; Strong et al., 2014), and theory of effective use (Burton-Jones and Grange, 2013).

The critical realist ontology is built upon two basic premises which substantially influence the epistemological stance and the methodological option space. First, critical realism presumes that the world is stratified into three layers (Bhaskar, 1975; Mingers et al., 2013): The surface level (or the empirical) encompasses events and sequences of events that are empirically observable. The subjacent level (or the actual) encompasses events that have occurred but were not observed or are not observable by human senses. Thus, the observed phenomena in the empirical represent a subset of the potential or unobserved phenomena in the actual. Finally, the deep structure level (or the real) consists of conditions and causal powers, so-called generative mechanisms, which bring events into existence. Humans––as parts of the deep structure––are conceptualized as having mind and agency as two central attributes and thus cannot be reduced to materiality (Miller, 2015). Thus, human agents possess intentionality and can interfere with events and change their course of action (Archer, 1995). However, deep structure conditions and generative mechanisms exist and act independently of human existence (Bhaskar, 1975). Similar stratified views have been proposed by Giddens (1984), Drazin and Sandelands (1992), and Orlikowski (1992). Yet, these views take a more idealist stance to argue that causal powers lie in the co-evolution of structure and action which co-create reality without the presumption of an objective materiality (Giddens, 1984; Volkoff et al., 2007). Thus, while these views acknowledge the existence of a “deep structure” entailing the rules that govern behavior and interaction, these rules are transitive (i.e., depending on human enactment; Drazin and Sandelands, 1992). This stands in contrast to the intransitive nature of causal powers in critical realism in which generative mechanisms exist and act independently of human existence and perception (Bhaskar, 1975).

Second, critical realists view the world as an open system in which no constant conjunctions prevail (Bhaskar, 1975). In contrast, in empiricism, causality resides in the conjunction between events and, therefore, constant patterns of conjunction are assumed to occur. However, given the stratified ontology of critical realism, events occur based on the acting of generative mechanisms. Depending upon situational and contextual conditions, the same generative mechanism may cause different sequences of empirically observable events to actualize (Wynn and Williams, 2012). This circumstance indicates that multiple causal paths can lead to a particular outcome which is referred to as contingent causality (Henfridsson and Bygstad, 2013; Sayer, 1992). In other words, in an open system, outcomes are probabilistic, not deterministic, because they depend upon the action of a generative mechanism which can cause multiple empirical outcomes (i.e., multifinality; Bygstad et al., 2016). The non-deterministic view renders critical realism useful in IS research where disentangling research objects from the respective context proves difficult (cf. Avgerou, 2019; Volkoff et al., 2007). However, this also marks the philosophy’s Achilles heel: As critical realism requires to move beyond the empirical chain of events for causal explanation formation, traditional methods for prediction may have limited application (Wynn and Williams, 2012). In the next section, we introduce bottom-up modeling (specifically, agent-based modeling) as a class of prediction methodologies that can be aligned with a non-positivist paradigm like critical realism.

Bottom-up modeling

Bottom-up modeling of complex adaptive systems (CAS) represents a relatively new approach. CAS have been defined in the literature as “systems composed of interacting agents described in terms of rules. The agents adapt by changing their rules as experience accumulates” (Holland, 1992: 1). Rather than discrete and static, in CAS, social systems (e.g., IS, organizations) are recognized to be emergent, self-organizing, interactive, and dynamically evolving (Nan, 2011). While there is no universal view of the concepts of CAS, most scholars agree on the existence of a micro-level which encompasses agents, their interactions, and an exogenous environment, which collaboratively create empirical patterns on a macro-level (Nan, 2011). It is for these properties, that CAS theory has been deemed useful as a foundation for computational bottom-up modeling and analytics. In this paper, we will focus on agent-based modeling, a form of bottom-up modeling, which has been used in IS in conjunction with CAS theory to simulate a variety of IT use processes (e.g., Haki et al., 2020; Ross et al., 2019).

Agent-based modeling is a computational modeling approach used to simulate the behavior of complex systems composed of interacting agents. In ABM simulations, individual agents are modeled to interact with each other and an environment according to pre-defined rules (micro-level). These interactions bubble up to macro-level observations (Amaral and Uzzi, 2007). Agent-based modeling has been used in a wide range of disciplines, including sociology, ecology, economics, biology, management, computer science, and engineering. The first studies using ABMs originated in sociology where residential segregation (Schelling, 1971) and the collapse of an indigenous civilization (Epstein and Axtell, 1996) were simulated. The method was introduced to management science with simulations of organizational learning (March, 1991). Since the 2000s, the ABM simulation literature expanded rapidly to include a wider range of disciplines and applications, such as, ecology, where ABMs were used to model the dynamics of ecosystems and the interactions between species (e.g., Wilensky and Resnick, 1999) or transportation planning, where ABMs were used to model the behavior of individual travelers and the dynamics of traffic flow (e.g., Toledo et al., 2003). In recent years, the ABM simulation literature continued to expand and mature, with a growing focus on the development of methods for model calibration, validation, and sensitivity analysis (e.g., Kwakkel, 2017; Valogianni et al., 2023). Moreover, agent-based modeling was increasingly integrated with other modeling approaches, such as optimization models and game theory (e.g., Venkatesan et al., 2015).

ABMs consist of three central elements: agents, interactions, and an environment whose characteristics, properties, and behavioral rules are translated into computational parameters. An ABM encapsulates algorithms that allow implementing the agents and their respective interactions (Nan, 2011). ABMs have proved useful in multi-leveled contexts in which macro-level patterns arise from micro-level interactions of learning agents in self-organizing interactions (Drazin and Sandelands, 1992; Helbing, 2012). In IS, such contexts encompass social networks, enterprise architectures, complex markets, and economics among others (e.g., Fang et al., 2018; Haki et al., 2020; Nan and Tanriverdi, 2017). Agent-based modeling offers multiple advantages in such contexts: First, as implementing manipulations in variables and thus experimentation is difficult in multi-level contexts, ABMs can be used to conduct experiments in simulated environments (Weisburd et al., 2022). Second, ABMs have the capacity to reveal causal mechanisms of modeled phenomena and thereby advance theoretical understanding instead of predicting outcomes from historical observations (Miller, 2015). Third, given the simulated nature of ABMs, they provide an excellent basis to test out and study scenarios and contingencies that would be difficult to implement in lab experiments and might be risky to implement in field experiments (Romme, 2004). Fourth, other than statistical estimation and differential equations, ABMs allow to model adaptive phenomena and behaviors by heterogeneous, learning actors that are not necessarily governed by a centralized control mechanism (Nan, 2011).

These properties of agent-based modeling align well with the two basic premises of critical realism. Recall the premise that the world is an open system in which no constant conjunctions prevail. Emergence in critical realism is a consequence of the acting of generative mechanisms. In ABMs, micro- and macro-level interactions are emergent and adaptive which ensures upholding the open system assumption in critical realism. In fact, ABMs perfectly cater to the multifinality of critical realism in helping to take over the complexity of computing the diverse outcomes from complex interaction patterns (Miller, 2015; Valogianni et al., 2023). The literature on CAS and ABM often defines open systems as exchanging information, energy, or matter with their environment, allowing for continuous adaptation and evolution. According to Holland (1992), these systems are characterized by their capacity to adapt and reorganize in response to external changes, a critical component of their complexity. This dynamic nature of open systems, as highlighted in CAS and ABM literature, is a source of constant exploration. This contrasts with the CR perspective, which focuses on the generative mechanisms that produce emergent properties. In the context of CR, open systems do not exhibit constant conjunctions or predictable outcomes due to the complex interplay of various generative mechanisms. By juxtaposing these definitions, it becomes clear that while critical realism emphasizes open systems’ unpredictability and non-deterministic nature due to internal generative mechanisms, CAS and ABM literature highlight the dynamic interactions and adaptability of systems in response to external influences. Both perspectives underscore open systems' complexity and emergent properties, though they focus on different aspects of what makes these systems “open.”

While different ontological and epistemological premises have been used for agent-based modeling, methodologists agree on the process theoretic (rather than variance theoretic) nature of explanations built into and obtained through ABMs (Grüne-Yanoff and Weirich, 2010; Miller, 2015). In essence, a causal explanation for system dynamics in an ABM can be viewed as a generative mechanism as in critical realism that can then be used to simulate different emergent outcomes (Miller, 2015).

However, despite the similarity in process perspective, the causal structure in ABM and critical realism requires explicit alignment. To be applicable in critical realist inquiry, researchers using ABMs should engage in explicitly adding ontological stratification to their models. Ontological stratification means that material structure and human agency are separated into distinct strata (i.e., the real that is independent of human interaction and the actual and empirical that are dependent on human interaction or at least human perception). In contrast, empirical stratification refers to the idea of separating empirical phenomena into different levels of analysis, for example, micro- versus macro-level interactions, individual versus organizational outcomes (cf. Burton-Jones and Gallivan, 2007). While the ABM methodological literature is strong on guidance with respect to empirical stratification, critical realism can offer an additional perspective to facilitate disentangling ontological stratification and thus causal structure. Thus, we propose stratified ABM which seeks to innovate ABM methodology to make it applicable in critical realist inquiry by suggesting distinct methodological steps. Following this step-by-step approach assists researchers using ABMs in conjecturing generative mechanisms, codifying them into model components, and simulating various realizations of these generative mechanisms.

Proposal of stratified ABM

Our proposed approach, stratified ABM, demonstrates how the concept of generative mechanisms can be embedded into ABMs and manifesting in observable agent behaviors and interactions. Generative mechanisms manifest through events, and sometimes through agents’ interference with sequences of events. Therefore, after identifying the generative mechanisms responsible for the empirical manifestation of a specific pattern, we need to codify them into the components of an ABM. We propose four steps to build an ABM that relies on a stratified ontology through which different states of the system can be simulated and thereby allows for prediction in non-positivist paradigms (see Figure 1).

Figure 1.

Four steps toward prediction in critical realism using stratified agent-based models.

Step 1: Capturing the phenomenon: Drawing upon the critical realist method for explanation formation (cf. Bhaskar, 1975), we propose capturing the empirical phenomenon by decomposing and theoretically redescribing it. Specifically, empirical data is decomposed into the basic concepts (i.e., agents, events, conditions) and brought into the sequential and causal pattern.

Step 2: Identifying the generative mechanism: In line with the critical realist methodological literature, we propose to identify the generative mechanisms at play causing the respective phenomenon through retroductive analysis (cf. Leidner et al., 2018; Wynn and Williams, 2012)––a form of backward chaining between events. After conjecturing possible generative mechanisms, through elimination, the generative mechanisms that are not actually causal to the respective phenomenon are discarded (cf. Bhaskar, 1975).

Step 3: Building the ABM: Modeling the agent ecosystem does not differ technically from traditional approaches. Model components are defined. Agents are then conceptualized with different attributes and behavioral rules, which govern their interactions in light of an exogenous environment (Ross et al., 2019). These elements are translated into computational parameters and algorithms and then implemented in a programming tool (Nan, 2011). For the ontological stratification, we propose carefully codifying the generative mechanism(s) identified in step 2 into different components of the ABM.

Step 4: Simulating states of the system: Finally, using the ABM, the “actual” state(s) which a system can take given the action of a generative mechanism are simulated––or predicted. Depending upon the research question, the nature of the state of the system of interest may vary, for example, calling for the exploration of potential temporal patterns or the contrasting of different scenarios given specific model alterations.

Next, we outline the four steps of stratified ABM in detail, followed by an exemplar in which we simulate the outcomes of account removals to fight malicious content propagation in a social network.

Step 1: Capturing the phenomenon

Methodologists postulate that an ABM should be phenomenon-centric, that is, the phenomenon should inform the model and not the other way around (Miller, 2015). Thus, in line with the critical realist method for explanation formation (cf. Bhaskar, 1975), we propose to start the process of capturing the phenomenon by decomposing and theoretically redescribing it. The critical realist literature has brought forward a detailed approach to capture a phenomenon (e.g., Leidner et al., 2018). The sequences constituting a phenomenon are decomposed into the basic concepts: agents, events, and conditions. The purpose of this decomposition is to resolve a complex event into its components to prepare for ontological stratification and thus enable causal analysis. Events refer to changes of things in the empirical domain that result from either the actions of agents or from what happens to things (Wynn and Williams, 2012). To decompose a complex event, the core events must be identified (Volkoff et al., 2007). Agents—for example, animate objects like humans, groups of humans, or animals—in critical realism are particulars that have the power to interfere with the empirical phenomena and induce changes in events (Bhaskar, 1975). Finally, conditions refer to the structures of systems that shape how generative mechanisms act and the empirical patterns they produce (Wynn and Williams, 2012).

After decomposing a complex event, the goal of the redescription step is to reconfigure the causal sequence of events to facilitate theorizing about the generative mechanisms at work. While this step may sound obsolete, note that chronological adjacency does not necessarily imply a causal link between two events in critical realism. That is, the causal sequence of events and agents involved in these events is not always the same as the empirically observable temporal patterns.

Irrespective of the nature of the data, this step is highly inductive, that is, data-driven and not theory-driven. While many critical realist studies in IS have relied on qualitative data from case study research (Wynn and Williams, 2020), we argue that this step can be applied to larger datasets equivalently. For example, in our exemplar, we draw upon a data set of more than two hundred million Tweets originating from 17 countries. We decompose this data into events, agents, and conditions to identify the generative mechanisms at play.

Step 2: Identifying the generative mechanism

The critical realist methodological literature proposes to identify the generative mechanisms at play causing the respective phenomenon through retroduction (cf. Leidner et al., 2018; Wynn and Williams, 2012). Retroduction infers explanation by taking “some unexplained phenomenon and propos[ing] hypothetical mechanisms that, if they existed, would generate or cause that which is to be explained,” (Mingers, 2004: 94). Thus, retroduction involves backward chaining from consequent to antecedent events, and inductively and creatively conjecturing causal explanations that may have produced the events (Wynn and Williams, 2012). For each connection between two events, the generative mechanism which could have brought the micro-level sequence into existence, a candidate mechanism is conjectured. Similarly, if available, analogies from literature may be used as conjectures (Bhaskar, 1975). However, since they are analogies and the empirical patterns caused by generative mechanisms are highly context-sensitive, they still have to be plausibility-checked.

After all candidate mechanisms for large parts of the event sequence have been identified, scholars will end up with a set of candidate mechanisms that could be causal to the entire sequence. Through elimination, the (set of) generative mechanism(s) that do not allow for explaining most of the sequence are eliminated (cf. Bhaskar, 1975). Once again, this step is highly inductive, even with large and quantitative datasets as explanation formation happens at the conceptual level.

Step 3: Building the ABM

Modeling the agent ecosystem does not differ from traditional approaches per se. Agents are conceptualized with different attributes and behavioral rules that govern their interactions in light of an exogenous environment (Ross et al., 2019). To build an ABM in critical realist inquiry, we propose drawing on the abstracted elements identified in the first step of decomposing the phenomenon and enriching them with the detail needed to model the state of the system. Agents must be described in terms of their attributes and rules governing their behaviors. Interactions among agents are a function of their connections and the flow of resources of any kind. Finally, the environment provides the context in which the agents operate and interact with each other such as physical space or social network (Wilensky and Rand, 2015). Since there is no correspondence for the generative mechanisms in CAS and ABMs, they must be codified into the concepts that represent events in critical realism, namely, agent attributes, behavioral rules, connections, and flow. However, it is not required for every generative mechanism to affect each and every component (cf. contingent causality). Some generative mechanisms may govern individual behaviors (i.e., individual-level mechanisms) and thus be observable at the individual-level and then unfold their effects through interactions of individuals at the macro-level (cf. Williams and Karahanna, 2013). For example, cognitive load is a mechanism that limits information processing capacity at the individual-level. Other generative mechanisms directly govern interactions among individuals or at the macro-level (cf. Williams and Karahanna, 2013). For example, the preferential attachment (PA) mechanism explains how connections in social networks form and scale. Codifying the generative mechanisms into ABM components is highly context dependent and must be conducted carefully based on the preceding inductive analyses.

The abstracted elements identified in the first step of decomposing the phenomenon are translated into computational parameters and algorithms and then implemented in a programming tool (here, Python; Nan, 2011). These parameters inform the ABM development with respect to the three major considerations: agent attributes, the respective specifications for the interactions between said agents and environmental definition for the agents and interactions to take place in. The elements are especially important in the definition of the various agent characteristics along with their behavior rules as an individual or interaction with others within the environment. Therefore, the implementation of the elements is critical so as to ensure the emergent behavior from the simulation is consistent and the appropriate one.

Step 4: Simulating states of the system

Finally, using the ABM, the “actual” state(s) which a system can take given the action of a generative mechanism are simulated––or predicted. Depending on the research question, the nature of the state of the system of interest may vary. For example, scholars interested in the evolution of a system may take a longitudinal perspective to explore the potential temporal patterns caused by a specific generative mechanism. Learning the evolution of the system may uncover temporal patterns that can provide help in analyzing the distribution of events or phenomena over time and determining whether there are significant deviations from expected patterns. By analyzing the outputs of an ABM simulation, researchers can identify patterns in the behavior of the agents and how these patterns change over time. In contrast, scholars aiming to investigate how alterations of certain model specifications (e.g., in the environment) shape the manifestation of the generative mechanism at the empirical level may compare different kinds of manipulations against each other in an experimental fashion. For example, scholars can use ABM simulations to test out scenarios by manipulating the rules that govern agent behavior and observing the resulting changes in the emergent behavior of the system.

To assess the ABM’s robustness, reliability, and utility for understanding and predicting complex real-world phenomena, we propose the following internal and external validation approaches. Internal validation seeks to ensure that the ABM captures the authentic dynamics of the system. Despite the phenomenon-centric modeling approach proposed in stratified ABM, assessing the model’s accurate representation of the system dynamics and its consistency with the empirical data is critical (Ross et al., 2019). External validation, in turn, seeks to ensure the ABM’s predictive validity, as it gauges the model’s capacity to extend its predictions to new scenarios (Haki et al., 2020). To ensure external validation, the developed model is exposed to new data and then capturing the emerging behavior and observing whether it is consistent with expectations.

Exemplar: Predicting the effectiveness of social network interventions

To demonstrate the usefulness of the stratified ABM, we use the example of predicting the effects of account removal strategies to fight malicious content propagation in social networks. Malicious content refers to any content of intentionally manipulative nature which is purposefully curated and propagated by malicious agents to influence social network users’ opinions and real-world behaviors (Vosoughi et al., 2018). Besides examining the properties (i.e., linguistic, structural) of malicious content (e.g., de Lima Salge and Berente, 2017; Kim and Dennis, 2019; Lozano et al., 2020), research has investigated the empirical diffusion patterns of malicious content propagation. According to literature, so-called bots (i.e., synthetic accounts; Salge et al., 2022) play a major role in propagating malicious content (Lou et al., 2019) which was found to propagate faster and more broadly than legitimate content (Vosoughi et al., 2018). In fact, Ross et al. (2019) conclude based on a simulation that as little as 2% of the network population being malicious bots may be sufficient to sway public opinion. Thus, combating the spread of malicious content is not only in the interest of the social network providers but also of public interest as social networks have gained traction as information and news media to form political opinions. Prior research has brought forward a rich stream of literature on network interventions (see Valente, 2012) some of which have been deployed in practice. For example, the popular social networks Twitter and Facebook have repeatedly removed accounts propagating malicious content from their networks (Dwoskin, 2018). In this section, we will use the context of account removal on social networks to demonstrate the usefulness of stratified ABM. Our goals are to: (1) capture how malicious content spreads in the Twitter network; (2) identify the generative mechanisms that create the empirical propagation patterns of malicious content; (3) emulate the Twitter network using agent-based modeling in which we codify the identified generative mechanisms; and (4) predict the effectiveness of different levels of account removal by simulating different states of the system using the ABM. At the end of the section, we will discuss the purposefulness of using stratified ABM in this exemplar.

Step 1: Capturing how malicious content spreads in the Twitter network

Phenomenon and data: Since October 2018, Twitter has shared 37 datasets of attributed platform manipulation campaigns originating from 17 countries, spanning more than two hundred million Tweets and nine terabytes of media. The data archive contains the information about malicious tweets followed by other media content, such as videos or pictures. The datasets shared through the Twitter Moderation Research Consortium are large-scale datasets concerning platform moderation issues capturing every action of flagged malicious accounts across campaigns. This includes disclosures of datasets of persistent platform manipulation campaigns that consist of material that was posted in violation of the platform’s manipulation and spam policy. The raw data includes more than 30 attributes of the flagged accounts, for example, individual tweet information for every tweet that the accounts have either posted or reposted throughout their presence on the platform. It also includes information about individual accounts details such as their current location, origin, language along with their followers and following. Information, such as, the user id and display name of the account, are hashed to protect users’ privacy.

Decomposition: We analyze our data to decompose and redescribe the empirical phenomenon of malicious content propagation in the Twitter network. First, we applied network analysis techniques to identify the key agents and communities involved in the propagation of malicious content. For this, we looked at the network structure of the community. For example, we saw differences in agents’ centrality and popularity in the network (e.g., number of followers, following, ratio). Enriched with account metadata like the age and activity level of an account, we were able to describe different classes of agents in the network with great detail (e.g., percentage of network, popularity, information flow; Kane et al., 2014). Comparing the agent classes in our data set to the social network literature, we found correspondence with different user typologies in the literature. We distinguish between six human agent classes, namely, Celebrities, Influencers, Broadcasters, Amplifiers, Commentators, and Viewers (Krishnamurthy et al., 2008; Tinati et al., 2012; Varol et al., 2017) along with two synthetic agent classes Simple Bots and Sophisticated Bots (Chu et al., 2012; Salge et al., 2022; Varol et al., 2017). A detailed overview of the agent classes and their properties can be found in Supplemental Appendix 1. Moreover, we extracted the pattern of events by analyzing information about user engagement. Specifically, we looked at agents’ malicious actions in the network, such as, tweeting, viewing, and retweeting others’ tweets (cf. Kane et al., 2014).

Redescription: We reconstructed an abstracted sequence of events in which we schematically displayed the interactions of multiple agents (cf. Figure 2). In a schematic network with three agents, content is distributed in cascades with an Agent As generating an original piece of content (i.e., tweet), an Agent B viewing and propagating (i.e., retweet) user A’s original content, and an Agent C viewing and propagating user B’s redistribution of user A’s original content. Plausibility checks using prior literature on the propagation process in social networks supported our abstracted sequence of events (e.g., Kitchens et al., 2020). Additionally, we found different classes of agents with distinct properties which seemed to reveal tendencies of how the sequence of events unfolded. For example, broadcasters and influencers tended to tweet original content without retweeting others’ content, whereas amplifiers and bots rarely tweeted original content but mainly retweeted others’ content. We also observed the tendency of more popular and central users’ content to be retweeted at a higher rate. At a network level, most users (96%) were decentral and unpopular while less than 4% were well-connected and central.

Figure 2.

Theoretical redescription of malicious content propagation on Twitter.

Step 2: Identifying the generative mechanisms of malicious content propagation

Based on the theoretical redescription, we started by retroductively identifying different generative mechanisms at play to cause the empirical sequence of events. Note that the orange arrows in Figure 2 denote the retroduction chain which we used to conjecture the generative mechanisms behind each of them. We contend that for each backward arrow, two aspects need to be explained: (1) the existence of a connection between two agents, and (2) agents’ selection logic of which content to propagate (e.g., why Agent B retweets Agent A’s but not Agent C’s content). In our dataset of malicious tweets, we can observe that depending on the agent classes being removed from the network, network traffic is affected in different ways (see analyses in Supplemental Appendix). For example, removing influencers, celebrities, and broadcasters, significantly decreases network traffic whereas removing viewers, commentators, and bots leaves network traffic almost unchanged. Thus, we conclude that account popularity (i.e., more followers than following) may be a factor driving the spread of malicious content. Indeed, considering the scholarly literature, algorithmic filtering in social networks has been shown to amplify content by popular accounts (Goel et al., 2016; Yoo et al., 2019). A mechanism that has been theorized to underlie the formation of social networks (Johnson et al., 2014), such as Wikipedia and Twitter (Capocci et al., 2006; Romero and Kleinberg, 2010), is preferential attachment (PA). PA controls how new users entering a social network are more likely to link to popular users over average users of the social network (Barabasi and Albert, 1999), hence explaining the emergence of links between certain agents but not others. Since algorithmic filtering not only affects newly entering nodes but also the content feed of existing nodes (Kitchens et al., 2020; Shore et al., 2018), we conjecture that PA governs which content is viewed and is thus available for propagation. We suggest that the content appearing in users’ content feeds are selected according to a preferential logic, so content propagated by a popular user is more likely to appear to others. Further considerations regarding PA can be found in Supplemental Appendix 2.

Additionally, we found that the preferential sharing of content by popular accounts was more prevalent in human agents. Thus, we had to conjecture a second generative mechanism that coincides with PA that leads to higher levels of preferential propagation through human accounts versus synthetic accounts. Therefore, we conclude that the second generative mechanism must be an individual mechanism. Since content feeds are sorted preferentially (most popular content at the top), we assume that human agents are restricted in their perception of the content feed. We conjecture that the amount of content from a human user’s feed that can be perceived (and is thus available for propagation) is restricted by a second generative mechanism, namely, humans’ limited information processing (LIP) capacities. Presuming LIP to govern content perception and thus propagation in social networks is in line with literature on social contagion and human learning (Chandler and Sweller, 1991; Hodas and Lerman, 2014; Islam et al., 2020). Further considerations regarding LIP can be found in Supplemental Appendix 3. We suggest that LIP reinforces the effect of PA, so that content by popular nodes is even more likely while content by less popular nodes is even less likely to be viewed and distributed. Thus, the two generative mechanisms that are helpful in explaining the sequence of events related to malicious content propagation on the Twitter network with all its idiosyncrasies are PA and LIP.

Step 3: Building the Twitter ABM

Using the aforementioned agent typology and the respective agent attributes, we populate the agent ecosystem and establish specific interactions. The two generative mechanisms play a key role in the overall process and are codified into different ABM components. PA is modeled at different stages. During the network formation stage, the first wave of PA is applied to initiate the follower-following relationships. Next, PA not only governs network formation but also drives content propagation. As PA has a fundamental role in curating the content feed for users in a network, it governs the exposure of content generated in the network. Therefore, PA influences the engagement between the users and their respective content. Additionally, LIP capacities determine how much of the content feed a user ends up consuming before experiencing cognitive fatigue. In contrast to PA, LIP is only modeled at a single stage: that of the individual agent but can still impart a significant effect on the engagement between the users and their respective content (cf. emergence). Thus, careful considerations must be given while modeling these generative mechanisms. PA and LIP may have far reaching consequences for not only the interactions between users but the emergent behavior of the environment as a whole. We present the key elements and their respective operationalization in our ABM of the Twitter network in Table 1 below.

Table 1.

Operationalizations of key ABM components.

Elements	Description (Nan, 2011: 509)	Operationalization in Twitter ABM	Codification of the generative mechanisms
Agents	Individual agents or basic entities of actions	Legitimate/malicious, human/synthetic typology	na
Attributes	Internal states of agents	Popularity (i.e., followers, following), attention span	LIP in class-specific attention span
Behavioral rules	Schemata that govern attributes and behaviors of agents	Posting behavior, reposting behavior, following behavior	PA in rules of following, posting, and reposting
Interaction	Mutually adaptive behaviors	Connecting, reposting	na
Connection	Relational links among agents	Centrality	PA in rules of following
Flow	Movements of resources	Appearance of content in feed	PA × LIP in retweeting (popular content that has been viewed)
Environment	Medium for agents to operate on and interact with	Agent eco system, language, location	na
Structure	Topography of an environment and its relationship to agents	Content selection and sorting in feed	PA in network formation

ABM: agent-based models; PA: preferential attachment; LIP: limited information processing.

We implement our ABM in Python using object-oriented programming features. We generate the social network. This social network governs the agent interactions and communications. The process of network generation in our model entails each agent making selections of other users to follow, guided by a class-specific range. To establish the follower connections between agents, we draw inspiration from past literature, particularly the work of Barabasi and Bonabeau (2003) and Ross et al. (2019), employing the Preferential Attachment (PA) mechanism. This mechanism not only influences the follower connections but also significantly impacts the subsequent activities of agents, such as reading and reposting content. The outcome is a network topology characterized by a non-homogeneous scale-free degree distribution. In simpler terms, this means that our model incorporates a pattern where a few agents have a disproportionately large number of followers, aligning with real-world scenarios observed in social networks. This non-uniform distribution enhances the realism of our model by capturing the inherent heterogeneity in user connectivity. More details on the ABM instantiation can be found in Supplemental Appendix 4.

Step 4: Simulate different account removal strategies

Design of the simulation experiments

A simulation experiment depends on basic parameters such as duration (i.e., number of ticks) and size of the agent ecosystem (i.e., number of agents). Our simulation experiments used a total of one thousand agents that run for 30 ticks, with one tick representing a day and 30 ticks representing a calendar month. As a robustness check, we also ran simulations with 10 thousand agents to ensure consistency of our findings.

A random process generated the daily actions of each individual agent on a tick-by-tick basis. For example, every agent in the network has characteristics such as the number of posts they generate, how many other users they follow, and how many posts they read and repost on a regular basis, among others. Once the simulated environment has been set up in tick 0, every user starts producing and consuming information as per the characteristics of their respective agent classes. For example, once agent A of type Influencer is instantiated, they start producing on average two to three posts per day. They also read and repost from their network as per their agent class.

The post process allows each agent to create and share content. The repost process is at the core of propagation and relies heavily on PA mechanism––similar to the network generation phase. Whenever an agent is to act through reposting, a post is selected from a list of all previous posts. This post selection can either be chronological in nature or it can incorporate PA such that popular posts are given a boost. That is, under PA, the previous number of reposts determines the chances of a particular post being selected in the future. This enables certain posts to gain an early advantage and go viral, mimicking the same phenomenon seen in real-world social networks.

These key parameters that capture user characteristics such as their reputation, likes and activity levels, along with preferential attachment-based follower selection, were used to generate the social network. For each tick, all agents were given the opportunity to act in a random order, possibly posting and/or reposting within the daily ranges established for each specific agent class. The network topology remained fixed during a simulation.

Conditions and measures

Based on this simulation, our goal was to assess if removing accounts from a social network is effective to curb the propagation of malicious content. For this, we compared the effectiveness of account removal at four different levels. Thus, we used four different manipulations, namely removing 25%, 50%, 75%, and 99% of malicious accounts. The impacts of these manipulations were compared to a no-intervention baseline simulation to find the optimal level of account removal.

To assess the effectiveness of different levels of account removal, we operationalized the following prediction outcomes. Intervention impact in our case was the propagation of content which we capture as the number of retweets (i.e., reposts) garnered by each post. Additionally, we looked at the agent class of the content generating user.

Validation

We validated our model by plugging in data from a real-world agent to see if the behavioral patterns in our simulation are in line with reality. For the purpose of internal validation, we conducted an examination using data derived from malicious campaigns. To facilitate the development of user typology through initial cluster analysis, information from 28 campaigns was utilized, thus leaving an additional pool of 9 campaigns for subsequent validation purposes (consistent with 75–25 training testing split). Within the latter subset, three campaigns emanated from the same geopolitical origin, specifically Russia. The data from these three campaigns were meticulously employed to categorize users based on their identified agent types. Subsequently, a systematic removal of agents of varying types from these campaigns was executed, and the consequential impact on the overall propagation of content was recorded. This methodology was then mirrored in our ABM, wherein agents of distinct types were systematically eliminated, and the resultant decline in overall content propagation per user in the simulation was found to be consistent with the observed patterns in the original campaign data. This internal validation process further substantiates the fidelity of our ABM in replicating the dynamics observed in real-world malicious campaigns. More details on the internal validation can be found in Supplemental Appendix 5.

For the purpose of external validation, a custom agent named “CELEBRITY” was introduced into our network, designed to emulate the behavior of a real-life celebrity account. The selection of a celebrity agent was motivated by the notion that the actions of such prominent figures could have a ripple effect across the entire network, providing a robust benchmark for assessing the realism of our model. The real-life Twitter account of former President Donald Trump was chosen for data collection, considering its significant influence. Also, Twitter’s bot removal efforts in 2018 had an observable impact on the account’s content propagation pre versus post removal. To validate the model, we closely monitored changes in the CELEBRITY account’s statistics in real life and compared them with the corresponding agent in our simulation. We found that for our real-life CELEBRITY agent, the average drop in amplitudes of repost count received per post from before bot removal (June 15–July 30, 2018) to after bot removal (November 1–16, 2018) was 56.8%. Simulations were conducted at all four levels of bot removal (25%, 50%, 75%, and 99%), reflecting the uncertainty regarding the prevalence of bots in the Twitter network. Focusing on the repost levels of over 300 real-life tweets, we compared these with the simulated posts in the ABM to capture the propagation dynamics. The analysis revealed that the simulated bot removal at 50%, specifically, mirrored the real-life drop in repost amplitude levels, with other measures such as repost count, maximum repost count, minimum repost count, mean repost, repost standard deviation, and repost median also exhibiting distinctive similarities. Statistical rigor was applied through Welch’s t-test, confirming the non-significant difference in repost counts before and after bot removal for both the real-life actor and the simulation agent at the 50% removal level. This lack of significant differences substantiates external validation, affirming that our ABM faithfully emulates malicious content propagation in real-life social networks. More details on the external validation can be found in Supplemental Appendix 6.

Results

Simulating and comparing four different levels of account removal using stratified ABM yielded the following results. First, compared to the no-intervention baseline, we observe that across all conditions, the removal of malicious accounts leads to a drop in retweets for all agent classes. The dampening effect on retweets is highest for content generated by popular agent classes (e.g., celebrities and influencers) and lower for less popular agent classes (e.g., commentators and amplifiers). Additional descriptive results are presented in Supplemental Appendix 7.

In Table 2 presented below, we record the effect of different levels of bot removal interventions on mean retweet counts based on agent class (bold rows marked with LIP x PA). We observe that at 25% of bot removal, the less influential agent classes such as Amplifiers and Broadcasters experience a greater dip (∼ 25%) compared to influential agent types such as Influencers (∼10%). But at 75% or higher levels of bot removal the Influencer and Celebrity agent type experience a much stronger impact on retweets. This is interesting because it showcases that unless the flagging and removal of malicious accounts is greatly accurate, the overall impact of such a measure will be quite limited as the influential actors are not impacted.

Table 2.

Intervention effects on mean retweet count (combined PA x LIP, no PA, and no LIP).

		Agent class
		Amplifier	Broadcaster	Influencer	Commentator	Celebrity
No intervention	PA x LIP	8.03	40.55	120.74	8.39	160
	No PA	6.76	26.51	67.43	7.11	107.38
	No LIP	17.8	51.16	162.62	18.90	188.53
After 25% removal	PA x LIP	5.95	31.1	107.41	6.28	141.74
	No PA	4.45	20.04	53.21	5.79	101.17
	No LIP	15.37	43.46	128.99	15.96	164.4
After 50% removal	PA x LIP	5.79	29.49	93.33	5.98	129.56
	No PA	4.27	18.21	48.90	5.31	92.44
	No LIP	13.18	36.44	112.49	13.11	144.84
After 75% removal	PA x LIP	4.4	21.6	67.41	4.64	104.35
	No PA	3.7	14.88	38.46	4.48	72.20
	No LIP	9.66	29.8	96.8	10.9	123.31
After 99% removal	PA x LIP	1.98	10.3	44.4	2.28	86.52
	No PA	1.78	8.2	33.5	2.19	64.41
	No LIP	7.4	22.55	72.7	7.7	108.1

PA: preferential attachment; LIP: limited information processing. Meaning of the bold values is indicated in the text above

Purposefulness of stratified ABM

Going beyond the usefulness of our model in predicting the impact of account removal on retweeting patterns, we want to discuss the benefits of codifying PA and LIP into ABM components regarding predictive power. Table 2 also summarizes the retweet distributions by agent class under the experimental conditions for an ABM with the same setup but without PA (rows marked as “No PA”). The distribution clearly indicates that PA primarily benefits popular agent classes. While account removal leads to large differences in the drops in reposts with popular agents with or without PA, the differences of the drops in reposts for less popular agents with or without PA are negligible. Thus, predicting intervention effectiveness without codifying PA as a generative mechanism would lead to an underestimation of the effect of account removal on popular agent classes and an overestimation of the effect on less popular agent classes.

Furthermore, Table 2 summarizes the retweet distributions by agent class under the experimental conditions for an ABM with the same setup but without LIP (rows marked as “No LIP”). The distribution clearly indicates that LIP disadvantages all agent classes more or less equally. Removing LIP as a generative mechanism from the model leads to higher mean retweets because after preferential sorting, all content in a user’s feed (and not only the top portion) has the same chance to be viewed and reposted. Thus, predicting intervention effectiveness without codifying LIP as a generative mechanism would lead to an overestimation of the effect of account removal. Additional details regarding the proof of concept for stratified ABM are presented in Supplemental Appendix 8.

The results of our predictions using stratified ABM with PA and LIP highlight the importance of codifying the generative mechanisms to build a simulation model that mimics the real-world phenomenon of malicious content propagation in the Twitter network as closely as possible. While crucial for accurate predictions, the generative mechanisms as part of stratified ABM also advance our theoretical understanding of the drivers the of system dynamics under investigation. Thus, stratified ABM helps provide theoretical insight which can in turn help improve the prescriptions of effective interventions to counter malicious content. For instance, since we have established through our stratified Twitter ABM that PA is the main driver behind malicious content propagation by influencer and celebrity accounts who are responsible for a large share of the malicious content propagation, we might consider dampening the PA effect in the content filtering algorithm.

Discussion

Given the pragmatic value of prediction for scholarship and practice, we set out to repurpose ABMs for use in non-positivist paradigms to enable prediction. Drawing upon the critical realist philosophy of science and the methodological literature on agent-based modeling, we developed a step-by-step approach which involves retroduction-based explanation formation and stratified ABM relying on ontologically separated generative mechanisms to simulate different potential states of a complex system. The critical step and methodological innovation in the approach lies in identifying the generative mechanisms––that is, the underlying causal powers––as ontologically distinct properties and codifying them into different structural and relational components of the ABM. The stratified ABM approach encompasses four steps toward prediction under the critical realist paradigm: (1) Capturing the phenomenon, (2) identifying the generative mechanism, (3) building the ABM, and (4) simulating states of the system. We present an exemplar that investigates the effectiveness of account removal to combat malicious content propagation in social networks. We compared predictions using stratified ABM with the traditional approach to agent-based modeling and found that the stratified ABM with the codified generative mechanisms had a better affinity to reality and mimicked real-world behavioral patterns more accurately. Besides the improved predictive power of the simulation, the step of identifying generative mechanisms as a basis for modeling the phenomenon enables explanation of a phenomenon and thereby advances theoretical understanding and gives rise to scholarly contributions (cf. Miller, 2015).

Our work contributes to scholarship in multiple ways. First, we address the need for prediction across research paradigms. Traditionally, there has been a split between research paradigms and the related methodologies for addressing problems of description and explanation and those that enable scholars to solve problems of prediction and prescription (Gregor, 2006). IS scholars have stressed the need to capture human agency, dynamism, adaptivity, contextualization, and the multi-level nature when studying phenomena of IS use (Avgerou, 2019; Brynjolfsson et al., 2021; Burton-Jones and Gallivan, 2007; Nan, 2011; Orlikowski, 1992; Volkoff et al., 2007). Despite the prevailing pragmatic need to estimate future and multi-level phenomena of IS use and their outcomes, established methods of prediction (e.g., predictive, econometric, and analytical modeling) do not align well with non-positivist paradigms as they would require scholars to loosen underlying assumptions and sacrifice or simplify complexity of systems under investigation (Wynn and Williams, 2012). Stratified ABM allows researchers to uphold the contingent causality and open systems assumption central to philosophies like critical realism without the need to disentangle the research object from its defining context (cf. Avgerou, 2019 for the importance of contextualization in IS theorizing). As ABMs are slowly gaining traction in the IS and Management disciplines, scholarship has recognized the value of ABMs as a method in non-positivist paradigms in general (Nan, 2011) and critical realism specifically (Miller, 2015; Valogianni et al., 2023). These methodologists provide the theoretical and methodological groundwork for a step in the direction of making valid predictions in non-positivist paradigms. We draw on this line of work by borrowing from their underlying assumptions (Miller, 2015) and structural setup (Nan, 2011) and extend the literature by developing a step-by-step approach to move from empirical phenomenon to theoretical explanation, and finally, to prediction. In doing so, stratified ABM enables researchers to tackle problems of prediction from the perspective of a causal chain as opposed to limiting oneself to the empirical chain as in traditional prediction approaches. Thus, stratified ABM can help uncover the theoretical explanations for the rules which govern social interaction and the potential outcomes of these social interactions (i.e., self-organization, adaptivity; Drazin and Sandelands, 1992). Furthermore, the simulated states of the system using stratified ABM allow scholars to extend their predictions beyond directional effects to include effect magnitudes. Discerning effect magnitudes are important when simulations are used as instruments to inform policymaking. At a later stage, empirical data on the actualized events can be used to validate the model’s affinity to reality ex post.

Second, our research guides the application of ABMs to research questions that assume a separate (resp. stratified) ontology where causal powers only empirically manifest through the events they cause. Stratification in critical realism is two-dimensional: critical realists differentiate between ontological strata (i.e., the real, the actual, and the empirical; Bhaskar, 1975) and empirical strata (i.e., structure and agency; Archer, 1995; Volkoff et al., 2007). Hence, when using ABMs for critical realist inquiry, modeling requires researchers to explicitly add ontological stratification. Stratified ABM proposes distinct steps to identify and then codify generative mechanisms into ABM simulations. While this idea is not new and has been established in methodological articles aiming to connect agent-based modeling and critical realism (Miller, 2015), our work expands these approaches by making the codification of generative mechanisms an explicit and central step in the methodological approach. Finally, we offer a step-by-step approach to facilitate the search and modeling of causal mechanisms in ABM.

Third, stratified ABM has some specific methodological novelties. We underscore the importance of conceptually disentangling the phenomenon under investigation based on empirical data prior to setting up an ABM. Prior literature has suggested informing ABM simulations using abductive analysis of empirical data (Miller, 2015; Valogianni et al., 2023), yet our proposed approach takes this one step further. Specifically, we suggest that inductively decomposing the sequence of empirical events and then retroductively identifying the generative mechanisms at play helps understand the phenomenon and build more phenomenon-centric and thus realistic yet abstracted ABMs. Moreover, previous critical realist studies have mostly leveraged qualitative data in case study research (Wynn and Williams, 2020). We extend the methodological toolbox by demonstrating that generative mechanisms can as well be identified from large datasets of quantitative data (i.e., millions of Tweets and the respective engagement data). Thus, researchers may make use of the first two steps of our proposed approach for building explanatory theory from large datasets under a critical realist paradigm.

Our research is subject to limitations and paves the way for future research. First, the affinity to reality of our specific ABM may further be increased by inducing a dynamic network structure in which connections adapt over time. Second, while account removal strategies to combat malicious content propagation in social networks have found wide application in practice (e.g., Dwoskin, 2018; Harris et al., 2014), they have been subject to criticism. Common critique relates to the strategy potentially limiting freedom of speech of network actors as their accounts get removed from the network (Hudson, 2018). Moreover, account removal has been found to be limited in its efficacy and sustainability as malicious actors have to be identified and removed manually, though, nothing keeps them from opening a new account and re-entering the social network (Lazer et al., 2018). Stratified ABM and our exemplar study in the context of social networks suggests that drawing on generative mechanisms as the drivers of malicious content propagation may be helpful in developing new strategies to counter malicious behavior for researchers and practitioners alike. Predictions are a critical component of knowledge discovery. Researchers make limited predictions, particularly those using non-positivist paradigms. Our work combines critical realism and ABM to propose step-by-step approach that identifies and then codifies generative mechanisms into simulation models. The stratified ABM offers researchers a solution to make valid predictions without sacrificing or simplifying the complexity of the system they are investigating. Our work enables scholars to pursue predictive research using non-positivist paradigms, such as, critical realism, while catering to the open system, dynamic, and adaptive premises.

Supplemental Material

Supplemental Material - A critical realist approach to agent-based modeling: Unlocking prediction in non-positivist paradigms

Supplemental Material for A critical realist approach to agent-based modeling: Unlocking prediction in non-positivist paradigms by Annamina Rieder, Saurav Chakraborty, Sandeep Goyal, and Donald J Berndt in Journal of Information Technology.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Annamina Rieder

Supplemental Material

Supplemental material for this article is available online.

Notes

Author biographies

Annamina Rieder is an Assistant Professor of Management Information Systems at the Beedie School of Business at Simon Fraser University. She received a Ph.D. in Management from the University of St.Gallen, Switzerland. Her research interest lies in IS use and persuasive design, particularly in the domains of digital health, social media, and sustainability. Her work has appeared, among others, in the European Journal of Information Systems, Decision Support Systems, and the International Journal of Human-Computer Interaction.

Saurav Chakraborty is an Assistant Professor in the Information Systems, Analytics and Operations Department in the College of Business at the University of Louisville. He earned his Ph.D. in Information Systems from the University of South Florida (USF). His research focuses on the application of computational modeling and machine learning in the areas of financial markets, social media and IoT networks. Saurav has published his research in the Systems Journal and ACM Transactions of Management Information Systems along with conferences including International Conference on Information Systems, European Conference on Information Systems, Americas Conference on Information Systems and Hawaii International Conference on System Sciences.

Sandeep Goyal is a Dean's Scholar and Associate Professor in the Information Systems Analytics and Operations Department in the College of Business at the University of Louisville. He is also the director of Professional MBA, Full-time MBA, iMBA, and the MSBA programs. He has a passion for solving complex business problems. His main research interests are in business analytics, intelligent decision support systems, and the role of technological innovations in supply chain management. His research has been published in top Information Systems journals such as MIS Quarterly, Information Systems Research, and Journal of Management Information Systems as well as Operations journals such as Production and Operations Management. He has worked on consulting projects with several Fortune 500 companies, government institutions, and non-profits.

Donald J. Berndt is on the faculty of Muma College of Business at the University of South Florida (USF). He received his Ph.D. in Information Systems from the Stern School of Business at New York University and also holds a M.S. in Computer Science from the State University of New York at Stony Brook. Dr. Berndt's research interests include the intersection of artificial intelligence and database systems, data and text mining, as well as a long-standing interest in parallel programming.

References

Amaral

LAN

Uzzi

(2007) Complex systems—a new paradigm for the integrative study of management, physical, and technological systems. Management Science 53(7): 1033–1035.

Archer

(1995) Realist Social Theory: The Morphogenetic Approach. Cambridge, UK: Cambridge University Press.

Avgerou

(2019) Contextual explanation: alternative approaches and persistent challenges. MIS Quarterly 43(3): 977–1006.

Barabasi

Albert

(1999) Emergence of scaling in random networks. Science 286(5439): 509–512.

Barabasi

Bonabeau

(2003) Scale-free networks. Scientific American 288(5): 60–69.

Bhaskar

(1975) A Realist Theory of Science. York: Books.

Bourdieu

(1977) Outline of a Theory of Practice. Cambridge, UK: Cambridge University Press.

Brynjolfsson

Wang

Zhang

(2021) The economics of IT and digitization: eight questions for research. MIS Quarterly 45(1): 473–477.

Burton-Jones

Gallivan

(2007) Toward a deeper understanding of system usage in organizations: a multilevel perspective. MIS Quarterly 31(4): 657–679.

10.

Burton-Jones

Grange

(2013) From use to effective use: a representation theory perspective. Information Systems Research 24(3): 632–658.

11.

Bygstad

Munkvold

Volkoff

(2016) Identifying generative mechanisms through affordances: a framework for critical realist data analysis. Journal of Information Technology 31: 83–96.

12.

Capocci

Servedio

Colaiori

, et al. (2006) Preferential attachment in the growth of social networks: the internet encyclopedia Wikipedia. Physical Review E 74(3): 036116.

13.

Chandler

Sweller

(1991) Cognitive load theory and the format of instruction. Cognition and Instruction 8(4): 293–332.

14.

Chaudhuri

Gupta

Vamsi

, et al. (2021) On the platform but will they buy? Predicting customers' purchase behavior using deep learning. Decision Support Systems 149(2): 113622.

15.

Chu

Gianvecchio

Wang

, et al. (2012) Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Transactions on Dependable and Secure Computing 9(6): 811–824.

16.

Clauset

Larremore

Sinatra

(2017) Data-driven predictions in the science of science. Science 355(6324): 477–480.

17.

de Lima Salge

Berente

(2017) Is that social bot behaving unethically? Communications of the ACM 60(9): 29–31.

18.

Drazin

Sandelands

(1992) Autogenesis: a perspective on the process of organizing. Organization Science 3(2): 230–249.

19.

Dwoskin

CTE

(2018) Twitter is sweeping out fake accounts like never before, putting user growth at risk. Washington Post.

20.

Epstein

Axtell

(1996) Growing Artificial Societies: Social Science from the Bottom Up. Cambridge & Washington, DC: MIT Press.

21.

Fang

Lim

Qian

, et al. (2018) System dynamics modeling for information systems research: theory development and practical application. MIS Quarterly 42(4): 1303–1329.

22.

Ferrario

Gloeckler

Biller-Andorno

(2023) Ethics of the algorithmic prediction of goal of care preferences: from theory to practice. Journal of Medical Ethics 49(3): 165–174.

23.

Giddens

(1984) The Constitution of Society. Berkeley and Los Angeles, CA: University of California Press.

24.

Goel

Anderson

Hofman

, et al. (2016) The structural virality of online diffusion. Management Science 62(1): 180–196.

25.

Gregor

(2006) The nature of theory in information systems. MIS Quarterly 30(3): 611–642.

26.

Grüne-Yanoff

Weirich

(2010) The philosophy and epistemology of simulation: a review. Simulation & Gaming 41: 20–50.

27.

Haki

Beese

Aier

, et al. (2020) The evolution of information systems architecture: an agent-based simulation model. MIS Quarterly 44(1): 155–184.

28.

Harris

Moreland-Russell

Choucair

, et al. (2014) Tweeting for and against public health policy: response to the Chicago department of public health's electronic cigarette Twitter campaign. Journal of Medical Internet Research 16(10): e238.

29.

Helbing

(ed) (2012) Social Self-Organization: Agent-Based Simulations and Experiments to Study Emergent Social Behavior. Berlin-Heidelberg: Springer.

30.

Henfridsson

Bygstad

(2013) The generative mechanisms of digital infrastructure evolution. MIS Quarterly 37(3): 907–931.

31.

Hodas

Lerman

(2014) The simple rules of social contagion. Scientific Reports 4(1): 1–7.

32.

Holland

(1992) Complex adaptive systems. Dædalus 121(1): 17–30.

33.

Hudson

(2018) In the age of social media, expand the reach of the First Amendment. Human Rights 43(4): 2–4.

34.

Islam

AKMN

Laato

Talukder

, et al. (2020) Misinformation sharing and social media fatigue during COVID-19: an affordance and cognitive load perspective. Technological Forecasting and Social Change 159: 120201.

35.

Johnson

Faraj

Kudaravalli

(2014) Emergence of power laws in online communities: the role of social mechanisms and preferential attachment. MIS Quarterly 38(3): 795–808.

36.

Kane

Alavi

Labianca

, et al. (2014) What’s different about social media networks? A framework and research agenda. MIS Quarterly 38(1): 274–304.

37.

Kim

Dennis

(2019) Says who? The effects of presentation format and source rating on fake news in social media. MIS Quarterly 43(3): 1025–1039.

38.

Kitchens

Johnson

Gray

(2020) Understanding echo chambers and filter bubbles: the impact of social media on diversification and partisan shifts in news consumption. MIS Quarterly 44(4): 1619–1649.

39.

Krishnamurthy

Gill

Arlitt

(2008) A few chirps about Twitter. In: Proceedings of the first workshop on online social networks, Seattle WA, 18 August 2008, 19–24.

40.

Kwakkel

(2017) The exploratory modeling workbench: an open source toolkit for exploratory modeling, scenario discovery, and (multi-objective) robust decision making. Environmental Modelling & Software 96: 239–250.

41.

Lauterbach

Mueller

Kahrau

, et al. (2020) Achieving effective use when digitalizing work: the role of representational complexity. MIS Quarterly 44(3): 1023–1048.

42.

Lazer

Baum

Benkler

, et al. (2018) The science of fake news. Science 359(6380): 1094–1096.

43.

Leidner

Gonzalez

Koch

(2018) An affordance perspective of enterprise social media and organizational socialization. The Journal of Strategic Information Systems 27: 117–138.

44.

Lou

Flammini

Menczer

(2019) Information pollution by social bots. arXiv preprint arXiv:1907.06130.

45.

Lozano

Brynielsson

Franke

, et al. (2020) Veracity assessment of online data. Decision Support Systems 129: 113–132.

46.

March

(1991) Exploration and exploitation in organizational learning. Organization Science 2(1): 71–87.

47.

Miller

(2015) Agent-based modeling and organization studies: a critical realist perspective. Organization Studies 36(2): 175–196.

48.

Mingers

(2004) Re-establishing the real: critical realism and information systems research. In: Mingers

Willcocks

(eds) Social Theory and Philosophy for Information Systems. Chichester, UK: Wiley, 372–406.

49.

Mingers

Mutch

Willcocks

(2013) Critical realism in information systems research. MIS Quarterly 37(3): 795–802.

50.

Nan

(2011) Capturing bottom-up information technology use processes: a complex adaptive systems model. MIS Quarterly 35(2): 505–532.

51.

Nan

Tanriverdi

(2017) Unifying the role of IT in hyperturbulence and competitive advantage via a multilevel perspective of IS strategy. MIS Quarterly 41(3): 937–958.

52.

Orlikowski

(1992) The duality of technology: rethinking the concept of technology in organizations. Organization Science 3(3): 398–427.

53.

Orlikowski

Baroudi

(1991) Studying information technology in organizations: research approaches and assumptions. Information Systems Research 2(1): 1–28.

54.

Romero

Kleinberg

(2010) The directed closure process in hybrid social-information networks, with an analysis of link formation on twitter. Proceedings of the International AAAI Conference on Web and Social Media 4(1): 138–145.

55.

Romme

AGL

(2004) Unanimity rule and organizational decision making: a simulation model. Organization Science 15(6): 704–718.

56.

Ross

Pilz

Cabrera

, et al. (2019) Are social bots a real threat? An agent-based model of the spiral of silence to analyse the impact of manipulative actors in social networks. European Journal of Information Systems 28(4): 394–412.

57.

Salge

Karahanna

Thatcher

(2022) Algorithmic processes of social alertness and social transmission: how bots disseminate information on Twitter. MIS Quarterly 46(1): 229–260.

58.

Sayer

(1992) Method in Social Science: A Realist Approach. 2nd edition. London and New York: Routledge.

59.

Schelling

(1971) Dynamic models of segregation. Journal of Mathematical Sociology 1(2): 143–186.

60.

Shore

Baek

Dellarocas

(2018) Network structure and patterns of information diversity on Twitter. MIS Quarterly 42(3): 849–872.

61.

Strong

Volkoff

Johnson

, et al. (2014) A theory of organization-EHR affordance actualization. Journal of the Association for Information Systems 15(2): 53–85.

62.

Tinati

Carr

Hall

, et al. (2012) Identifying communicator roles in Twitter. In: Proceedings of the 21st international conference on world wide web, Lyon France, 16–20 April 2012, 1161–1168.

63.

Toledo

Koutsopoulos

Ben-Akiva

(2003) Modeling integrated lane-changing behavior. Transportation Research Record 1857(1): 30–38.

64.

Valente

(2012) Network interventions. Science 337(6090): 49–53.

65.

Valogianni

Padmanabhan

Qiu

(2023) Causal ABM: a methodology for learning plausible causal models using agent-based modeling. Available at SSRN: https://ssrn.com/abstract=4343647

66.

Varol

Ferrara

Davis

, et al. (2017) Online human-bot interactions: detection, estimation, and characterization. Proceedings of the International AAAI Conference on Web and Social Media 11(1): 280–289.

67.

Venkatesan

KGS

Arunachalam

Vijayalakshmi

, et al. (2015) Implementation of optimized cost, load & service monitoring for grid computing. International Journal of Innovative Research in Computer & Comm. Engineering 3(2): 864–870.

68.

Volkoff

Strong

(2013) Critical realism and affordances: theorizing IT-associated organizational change processes. MIS Quarterly 37(3): 819–834.

69.

Volkoff

Strong

Elmes

(2007) Technological embeddedness and organizational change. Organization Science 18(5): 832–848.

70.

Vosoughi

Roy

Aral

(2018) The spread of true and false news online. Science 359(6380): 1146–1151.

71.

Weber

(1987) Toward a theory of artifacts: a paradigmatic basis for information systems research. Information Systems Journal 1(2): 3–19.

72.

Weisburd

Wolfowicz

Hasisi

, et al. (2022) What is the best approach for preventing recruitment to terrorism? Findings from ABM experiments in social and situational prevention. Criminology & Public Policy 21(2): 461–485.

73.

Wilensky

Rand

(2015) An Introduction to Agent-Based Modeling: Modeling Natural, Social, and Engineered Complex Systems with NetLogo. Cambridge, MA: MIT Press.

74.

Wilensky

Resnick

(1999) Thinking in levels: a dynamic systems approach to making sense of the world. Journal of Science Education and Technology 8: 3–19.

75.

Williams

Karahanna

(2013) Causal explanation in the coordinating process: a critical realist case study of federated IT governance structures. MIS Quarterly 37(3): 933–964.

76.

Wynn

Williams

(2012) Principles for conducting critical realist case study research in information systems. MIS Quarterly 36(3): 787–810.

77.

Wynn

Jr Williams

(2020) Recent advances and opportunities for improving critical realism-based case study research in IS. Journal of the Association for Information Systems 21(1): 50–89.

78.

Yoo

Rabinovich

(2019) Diffusion on social media platforms: a point process model for interaction among similar content. Journal of Management Information Systems 36(4): 1105–1141.

79.

Zachariadis

Scott

Barrett

(2013) Methodological implications of critical realism for mixed-methods research. MIS Quarterly 37(3): 855–879.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.84 MB