Sage Journals: Discover world-class research

Abstract

The robustness of qualitative comparative analysis (QCA) results features high on the agenda of methodologists and practitioners. This article aims at advancing this debate on several fronts. First, in line with the extant literature, we take a comprehensive view on robustness arguing that decisions on calibration, consistency, and frequency thresholds should all be tested. Second, we introduce the notion of “sensitivity range” as the range of values for any of these parameters within which the solution formula remains unchanged. Third, we argue that interpreting robustness is more intricate than simply checking if solutions remain unchanged. Beyond sensitivity ranges, researchers should assess robustness by evaluating changes in parameters of fit and the classification of cases as robust, shaky, or possible. Fourth, we enable researchers to perform more than one robustness test at a time by proposing the notions of a “test set”: the overlap between conceptually plausible alternative solutions that can be generated; and of a “robust core”: that part of a QCA solution that withstands the robustness checks. Fifth, we present functionalities implemented in the R package SetMethods that enable researchers to put in practice our proposals. These advancements are integrated into a comprehensive QCA Robustness Test Protocol consisting of three main tests: sensitivity ranges, fit-oriented robustness, and case-oriented robustness. We illustrate the protocol’s implementation with an example on high life expectancy across the globe.

Keywords

QCA robustness sensitivity ranges fit-oriented robustness case-oriented robustness

Robustness features high on the agenda in many empirical data analysis techniques. By and large, the higher the number of cases included in the analysis and the less intimate case knowledge exists, the more discretionary analytic choices researchers have to make, and thus, the more important becomes the need for applying systematic robustness tests against such choices. This also holds for qualitative comparative analysis (QCA), especially when applied to more than a handful of cases. With a larger number of cases, algorithmic routines with multiple, often discretionary, analytic decisions to be made gain prominence over intimate case knowledge. There already exists a literature on robustness in QCA. Some primarily focus on the question of how robust the method of QCA is. For different conclusions on this question, see Arel-Bundock (2019); Baumgartner and Thiem (2017); Braumoeller (2015); Hug (2013); Krogslund, Choi, and Poertner (2015); Rohlfing (2015, 2018); and Thiem, Spöhel, and Duşa (2016). Others are more interested in how robust specific results in applied QCA are, for instance, Cooper and Glaesser (2015); Emmenegger, Schraff, and Walter (2014); Schneider and Wagemann (2012); Skaaning (2011); and Rutten (2020). In tests of the method of QCA, simulated data are often used and are arguably superior to nonsimulated data from applied research (Rohlfing 2016; but see Lucas and Szatrowski 2014), simply because with simulations we can make sure the true data generating process is known and deviations from this “truth” become measurable. In tests of results generated in applied QCA, in contrast, such truth is not known. The QCA Robustness Test Protocol that we present in this article is primarily for applied QCA researchers, but elements of it can, and we think should, inform tests of QCA as a method as well. In other words, our protocol aims at probing how robust a specific QCA solution, derived from empirical data, is. This specific QCA result might or might not reflect the true data generating process. Our goal, thus, does not consist in testing whether QCA as a method is, in principle, capable of extracting the true data generating process. Instead, the goal is to probe how robust given QCA results are in applied research settings.

Across both groups of testing QCA as a method and of QCA results, a consensus seems to have emerged against which analytic decisions QCA should be robust. These are changes in raw consistency, frequency cutoff, calibrations, and changes to the set of cases or conditions under analysis. Authors in both groups also seem to agree by now on what can be considered as a robust QCA result. Only a minority takes the demanding position that the Boolean expressions must remain identical (e.g., Krogslund et al. 2015). A more lenient and differentiated approach stipulates that the results at least remain in a set relation (e.g., Arel-Bundock 2019; Baumgartner and Thiem 2017; Schneider and Wagemann 2012; Thiem et al. 2016).¹ What the applied robustness literature has in common, so far, is that any of the proposed robustness tests is performed “by hand,” that is, tailored to specific examples and executed in often time-consuming (and convoluted) series of separate calculations and interpretations of the test results (e.g., Rutten 2020). The few software solutions that exist, such as the retention function in R package QCA (Dusa 2018) or packages QCAfalsePositive (Braumoeller 2015) and braQCA (Gibson and Burrell 2018), are either meant to test QCA as a method or confined to narrower notions of QCA solution robustness tests than what we conceptually propose and computationally implement in this article.

This article aims at pushing the debate ahead on several fronts.² First, in line with the extant literature, we take a comprehensive view on robustness challenges and argue that decisions on calibration, raw consistency threshold, and frequency threshold should all be tested. Second, focusing on the Boolean expression of the solution, we introduce the notion of “sensitivity range.” This is the range of values for the location of any one of the qualitative calibration anchors, the raw consistency threshold, or the frequency threshold within which the Boolean formula for the solution remains unchanged, if only this single parameter is changed and everything else in the analysis is left unaltered. Third, we argue that the interpretation of robustness tests is more intricate than simply checking whether Boolean expressions of solutions remain unchanged. In addition to sensitivity tests, researchers should combine both what we call a fit-oriented and a case-oriented perspective on robustness. Along these lines, we, fourth, propose the notions of a test set (TS) and a robust core (RC). The TS comprises all conceptually plausible alternative solutions and we propose to aggregate it in two ways: using their intersection (minimal test set [minTS]) and union (maximal test set [maxTS]). The RC is that part of a QCA solution that overlaps with the TS, that is, that withstands all of the robustness checks summarized in the TS. We show that the concepts of minTS/maxTS and RC enable researchers to perform more than one robustness test at a time. Contrasting the minTS/maxTS and the RC with the initial solution (IS) reveals which elements of the latter stand on shaky grounds and which cases are deemed deviant or typical for nonrobust reasons. Fifth, we present a set of functions implemented in the R package SetMethods (Oana and Schneider 2018) that enable researchers to put in practice our proposals for robustness tests, thus making cumbersome and time-consuming robustness checks “by hand” an inefficient and error-prone process of the past. We illustrate the implementation of our QCA Robustness Test Protocol with a study on high life expectancy in countries across the globe (Paykani, Rafiey, and Sajjadi 2018).

Sensitivity Ranges

We label as $I S$ the QCA result that researchers consider as their best bet for substantive interpretation because it is the result of a carefully crafted and minimized truth table. The notion of IS is key to our entire QCA Robustness Test Protocol because it is the robustness of the $I S$ that is tested.

The concept of sensitivity range refers to the range of changes vis-à-vis calibration, raw consistency threshold, or frequency cut within which the IS stays the same.³ Sensitivity ranges allow us to empirically assess the limits within which the Boolean expression for the solution remains unchanged. This robustness test gives us lower and upper bounds for the location of any of the calibration anchors, for any condition or the outcome, the frequency threshold, or the raw consistency threshold. Knowing the size of these ranges informs about how far the IS depends on particular choices of calibration,⁴ raw consistency thresholds, or frequency cuts.

For obtaining sensitivity ranges, we perform each possible change in turn (change of qualitative anchors, change in raw consistency, and change in frequency threshold) by gradually increasing and then gradually decreasing values until the Boolean formula for the solution changes. This is done for each change in isolation, holding all the other parameters at their original value. Through this iterative procedure, we obtain the upper and lower limits for a particular change within which the IS stays the same.⁵ Therefore, in contrast to other QCA Robustness Test Protocol elements introduced below, for sensitivity ranges we do not start by modifying multiple parameters simultaneously. Instead, we focus only on the IS to find its robust boundaries by performing one change at a time. The sensitivity range is entirely derived based on empirical findings. Considerations as to what a conceptually plausible range of analytic decisions looks like only come into play during other elements of our robustness protocol (see below). This implies that empirically derived sensitivity ranges could be both too narrow—there are conceptually plausible thresholds outside the range (scenarios 2 and 3 in Figure 1)—or too broad—the QCA solution is robust even against conceptually implausible thresholds (scenario 1 in Figure 1).

Since changes can be made for the raw consistency threshold, the frequency cut, and for each qualitative anchor (the crossover point for crisp sets or the 0, 0.5, and 1 anchors for fuzzy sets) of each condition and the outcome, the formula for obtaining the full spectrum of sensitivity ranges within which we have perfect robustness in terms of the Boolean expression of the IS is: $3 \times N_{s e t s} + 2$ . N denotes the number of sets (conditions and outcome) in the analysis. For instance, in a QCA with five fuzzy set conditions, there are $3 x 6 + 2 = 20$ sensitivity ranges to be calculated, whereas in QCA with five crisp set conditions, there would be $6 + 2 = 8$ sensitivity ranges.⁶ Below, we introduce a set of functions implemented in the R package SetMethods (Oana and Schneider 2018) that aid in calculating sensitivity ranges for QCA solutions.

Calculating sensitivity ranges is an improvement in applied QCA, yet it suffers from two shortcomings. First, each sensitivity range is calculated in isolation. Each change (e.g., for a single calibration anchor in a single condition) is performed in turn, while keeping all the other parameters fixed at their values in the IS (e.g., keeping the rest of the calibration anchors fixed, keeping the same raw consistency threshold, keeping the same frequency threshold). What we obtain from this procedure is an answer to the following question: keeping all other analytic decisions fixed, what is the range of values for the analytic decision of interest within which the IS is insensitive? However, the challenge in gauging robustness in QCA consists in the fact that it can be threatened by multiple sources and that the combination of changes might matter for robustness. There are many analytic decisions a researcher has to make, most of which could be taken in a slightly different yet equally plausible manner. The qualitative anchors for one or more conditions or the outcome could be placed slightly different, the raw consistency threshold could be placed slightly higher or lower, or the frequency cutoff for truth table rows be chosen to be slightly higher or lower.⁷ Both in principle and practice, it can happen that the combination of two such analytic changes that are both within the sensitivity range can trigger a change to the IS.⁸

Second, conceptually plausible analytic changes might be outside of the empirically derived sensitivity ranges. Imagine the empirically derived sensitivity range for calibrating the set of “tall person” runs from 195 to 210 cm. Yet, conceptually, it is plausible to argue that persons of 190 cm height—a value outside the sensitivity range—could be considered tall as well. This corresponds to scenario 2 in Figure 1. We argue that strong conceptual arguments overwrite empirically derived sensitivity ranges: One should always perform robustness tests against conceptually plausible ranges. Conceptually, plausible test outside of the empirical sensitivity range constitutes hard tests (see Figure 1) and hard tests are to be preferred.

Figure 1.

Empirical and conceptual sensitivity ranges. Note: – – – = conceptual ranges; 1 = fully inside empirical sensitivity; 2 = partially; 3 = fully outside.

Fit Orientation and Case Orientation on Robustness

The sheer number of potential robustness threats in isolation already poses practical problems. They are partially tackled by the notion of sensitivity ranges just introduced and empirically illustrated below. The problem of manifold potential robustness threats is further exacerbated by the fact that alternative analytic decisions could also be taken in combination rather than in isolation. The number of different combinations of plausible robustness checks lets currently available robustness test procedures and their interpretation quickly spiral out of control. What is needed is an approach to robustness that can cope with multiple robustness checks at the same time and that summarizes the findings in a concise yet detailed manner. The next two steps in the robustness protocol are designed to handle such a robustness test strategy.

The IS, the minTS/maxTS, and the RC

We propose the following solution to this conundrum. Rather than contrasting the $I S$ separately with each solution produced after each separate alteration (e.g., different qualitative anchors for some or all conditions or different raw consistency thresholds), we suggest to aggregate all these alternative solutions into the so-called TS. The TS represents the space of possible solutions, that is, solutions that are generated based on alternative analytic parameters that fall within the range of substantive plausibility. Alternative solutions are aggregated into a $min T S$ and a $max T S$ by obtaining their intersection, or respectively, their union. $min T S$ represents the common denominator of all alternative solutions, that is, the area on which they agree. $max T S$ , in contrast, represents the entire area of possible solutions. Below, we explain that in our test protocol, the $I S$ is tested against both $min T S$ and $max T S$ because both reveal different pieces of robustness-relevant information.

Apart from the $I S$ and the $min T S / max T S$ , a third concept is crucial for understanding our QCA Robustness Test Protocol: the Robust Core (RC). The RC is obtained by intersecting the IS and all the alternative solutions, that is, the minTS ( $R C = I S * min T S$ ).⁹ The RC thus expresses that part of the initial QCA solution that is supported by all the robustness tests performed by the researcher.

Figure 2 visualizes our idea for robustness in QCA. It depicts the $I S$ and the $min T S / max T S$ together with the outcome set Y. Depending on the relative size of specific areas, the robustness of $I S$ is higher or lower. For instance, if the $I S$ , $min T S$ , and $max T S$ fully overlapped, the $I S$ would be fully robust against any of the changes tested in a particular study. The RC would be identical to both the IS and the minTS/maxTS ( $I S = min T S = R C = max T S$ ). As soon as the overlap becomes less than perfect, robustness decreases. When the overlap between the $I S$ and the $min T S$ becomes less than perfect, the $R C$ becomes smaller than either the $I S$ or the $min T S$ , or both. When the overlap between the $I S$ and the $max T S$ is less than perfect, the RC need not be affected, but new cases that are part of the $max T S$ become additional, possible cases to be taken into consideration for robustness, as we shall see below when looking at case-oriented robustness. Therefore, departures from the ideal situation in which $I S = min T S = max T S$ are problematic not only in terms of the size of the $R C$ but also in terms of the general degree of the overlap between the $I S$ and the $min T S / max T S$ . In the fit-oriented robustness section below, we introduce a series of robustness parameters that allow for quantifying the different kinds of departures from the ideal situation.

Figure 2.

Initial solution, minimal and maximal test set, and the robust core. Note: minTS = minimal test set; maxTS = maximal test set; RC = robust core; IS = initial solution; Y = outcome.

Finally, Figure 2 also displays the outcome set. This is important both for our fit-oriented and our case-oriented robustness perspectives that we introduce in the next sections.

Fit-Oriented Robustness

The fit-oriented perspective compares how well the $R C$ fares in comparison to the $I S$ and whether and how much the $min T S$ , $max T S$ , and the $I S$ are in a subset relation to one another. In essence, this step in our protocol compares parameters of fit (consistency, coverage, and set coincidence) for the $I S$ , the $R C$ , and the $min T S / max T S$ . We introduce here four such robustness fit (RF) parameters that enable applied researchers to make these comparisons.

The first two parameters express how well the $R C$ , that is, the part of the solution that withstands all changes, fares in terms of consistency and coverage in comparison to the entire $I S$ . The calculation of the parameters of fit for $R C$ and $I S$ , respectively, is straightforward. We simply use the standard formulas for consistency and coverage (e.g., Ragin 2006; Schneider and Wagemann 2012). Our $R F$ parameters are then calculated by simply dividing the parameter values for $R C$ and $I S$ , respectively. The more the $I S$ and $R C$ overlap, the higher the value of these parameters and the higher the robustness of the IS.

For consistency, the RF parameters ( $R F_{c o n s}$ ) is calculated by equation 1. It divides the consistency of $I S$ by that of $R C$ . As by definition, $R C$ is a subset of $I S$ , $R F_{c o n s}$ varies from 0 to 1, where higher values indicate more robust solutions.¹⁰ Notice that $R F_{c o n s}$ takes a value of 1 when the RC is identical to the IS ( $R C = I S$ ). That is, it has a maximum value when either $I S$ and $min T S$ perfectly overlap or also when $I S$ is a perfect subset of $min T S$ , and therefore, $I S$ is identical to the RC. Consequently, $R F_{c o n s}$ would be less than 1 not only when $I S$ and $min T S$ do not overlap but also when $min T S$ is a subset of $I S$ and, therefore, the $I S$ is not identical to the $R C$ , but a superset of it (both situations in which $I S \neq R C$ ).

\begin{array}{l} R F_{c o n s} = \frac{C o n s_{I S}}{C o n s_{R C}} \end{array} .

In a similar manner, for coverage, $R F_{c o v}$ is calculated with equation 2 by dividing the coverage of $R C$ by that of $I S$ .¹¹ As for $R F_{c o n s}$ , $R F_{c o v}$ also takes a maximum value of 1 when $R C = I S$ , and hence either when $I S = T S$ or when $I S < T S$ .

\begin{array}{l} R F_{c o v} = \frac{C o v_{R C}}{C o v_{I S}} \end{array} .

In addition to these two parameters, another two parameters of fit, $R F_{S C_min T S}$ and $R F_{S C_max T S}$ , are needed for evaluating the degree of overlap between the $I S$ and the $min T S / max T S$ . While the IS could perfectly coincide with the RC, there might still be other, possible cases included by the minTS and maxTS that the IS does not cover. For these third and fourth parameters, we propose to make use of the idea of set coincidence by Ragin and Fiss (2016). Applied to our purpose, it expresses the degree of overlap between $I S$ and the $min T S / max T S$ in relation to the entire space covered by either $I S$ or $min T S / max T S$ (see equations 3 and 4).

R F_{S C_min T S} = \frac{\sum min (I S_{i}, min (T S_{i})) = \sum R C_{i}}{\sum max (I S_{i}, min (T S_{i}))},

R F_{S C_max T S} = \frac{\sum min (I S_{i}, max (T S_{i}))}{\sum max (I S_{i}, max (T S_{i}))} .

All four parameters range from 0 to 1, with higher values indicating higher fit robustness. These are continuous measure with no clear threshold above which an IS is to be considered robust and below which it is not robust. Our RF parameters share this lack of clear thresholds with several other parameters in QCA (consistency, coverage, and Relevance of Necessity (RoN)), and just as with those other parameters, we also suggest that a case-oriented perspective must complement the fit-oriented numerical assessment of robustness. We develop this idea in the following section.

Case-oriented Robustness

Robustness parameters of fit allow us to assess the degree of overlap between the $I S$ , $R C$ , $min T S$ , and $max T S$ . What they hide is which and how many cases decrease these parameters and turn from typical cases to deviant cases, or vice versa, when alternative solutions are created. We deem it robustness-relevant information to reveal which cases are typical and which ones deviant for the $I S$ and the $min T S / max T S$ . A case-oriented perspective on robustness is fully in line with recent calls in the QCA robustness literature for integrating case-based knowledge rather than simply relying on a numerical perspective and decontextualized truth tables (Rutten 2020). This perspective is paramount because one key goal of QCA is to correctly classify cases according to their membership in recipes for an outcome. We offer two complementary presentational forms in order to explain the logic of the case-oriented perspective on robustness: an updated version of the Venn diagram in Figure 2 and an XY plot. In addition, we introduce three further robustness parameters that take the ratio of typical and deviant cases in $I S$ and $min T S / max T S$ , respectively, into account.

Venn diagrams and types of robustness-relevant cases

Figure 3 is an updated version of Figure 2. Each area in the respective diagrams for $min T S$ (left-side) and $max T S$ (right-side) denotes the robustness-relevant types of cases. Each type contributes different information to the question of how robust the $I S$ is.

Looking on the left-side diagram, we see the various intersections between the $min T S$ , $I S$ , and the outcome Y. Irrespective of the cases membership in the outcome, none of the cases located in the $R C$ poses a concern for case-oriented robustness, because both $I S$ and each alternative solution aggregated in the $min T S$ agree on these cases’ classification as either typical or deviant consistency in kind cases (Schneider and Rohlfing 2013). We label cases in these areas as robust typical (Y) and robust deviant consistency ( $\sim Y$ ). In fact, if all typical and deviant consistency cases were of the robust type, our $I S$ would be fully robust from a case-oriented perspective. All cases covered by $I S$ but not $min T S$ carry the adjective “shaky”: shaky typical cases (Y) and shaky deviant consistency cases ( $\sim Y$ ). They are shaky because they only look typical and deviant consistency, respectively, from the perspective of the $I S$ . As soon as plausible alterations to the analytic setup are allowed when producing the $min T S$ , these cases change their status.

Note that we use $min T S$ for defining robust and shaky cases, because compared to $max T S$ , this provides for a harder robustness test. Since $min T S$ is usually smaller than $max T S$ , it is harder for cases to be classified as robust and easier to be classified as shaky.

Along the same line, it provides for a harder robustness test to use $max T S$ for identifying the so-called possible cases. Looking at the right-side diagram, all cases covered by $max T S$ but not $I S$ carry the adjective “possible”: possible typical cases (Y) and possible deviant consistency cases ( $\sim Y$ ). They are possible because from the perspective of plausible alternative solutions to the $I S$ , they could be typical or deviant consistency cases. We consider all those cases covered by the union of alternative solutions $max T S$ , but not by the $I S$ , as possible ones because we want to take into consideration all those cases that might end up being relevant for the analysis had one solution been chosen over another.

Any case outside of both $max T S$ and $I S$ , irrespective of whether or not they are members of outcome Y, are of little concern for assessing the robustness of the $I S$ . We label them as extreme deviant coverage (Y) and irrelevant ( $\sim Y$ ) cases. Irrelevant cases are of no interest for robustness. Extreme deviant cases coverage $(\sim I S, \sim max T S, Y)$ do not convey direct information on the robustness of $I S$ but put into question the broader research design that produced the $I S$ . Apparently, there are cases that are members of the outcome for reasons outside of the scope not only the $I S$ but also any perceivable variation of the $I S$ as captured by the $max T S$ .¹²

Figure 3.

Robustness-relevant case types.

In addition to visualizing which specific cases fall into which category of case, we propose to also use the relative frequencies of the different case types for expressing the robustness of the $I S$ . There are two robustness case ratios: for typical cases and for deviant consistency in kind cases. The robustness case ratio for typical cases ( $R C R_{t y p}$ ) and for deviant consistency cases ( $R C R_{d e v}$ ), respectively, consists of the number of robust typical (robust deviant consistency) cases over the sum of all typical (all deviant consistency) cases. Equation 5 calculates the robustness case ratio for typical cases and equation 6 for deviant consistency cases. Presenting this information is one further tool in maintaining and strengthening the case-oriented nature of QCA—especially when the overall number of cases in a study puts this case orientation under stress.

R C R_{t y p} = \frac{r o b u s t_{t y p i c a l_min T S}}{s h a k y_{t y p i c a l_min T S} + p o s s i b l e_{t y p i c a l_max T S} + r o b u s t_{t y p i c a l_min T S}},

R C R_{d e v} = \frac{r o b u s t_{d e v . c o n s ._min T S}}{s h a k y_{d e v . c o n s ._min T S} + p o s s i b l e_{d e v . c o n s ._max T S} + r o b u s t_{d e v . c o n s ._min T S}} .

XY plots and case-oriented robustness

Venn diagrams like in Figure 3 are particularly useful in visualizing our conceptual idea of case types. They are less well suited for depicting concrete empirical situations in a given QCA study, especially when fuzzy sets are involved. For this, XY plots can be used. Figure 4 presents an XY plot with case memberships in a hypothetical initial sufficient solution ( $I S$ ) on the X-axis and each case’s membership in the $min T S / max T S$ on the Y-axis. Depending on the case type, membership in either the $min T S$ or the $max T S$ is taken into account for depicting a case’s specific position in the plot. Full-circle markers indicate cases that are the members of the outcome ( $Y > 0.5$ ) and empty-circle markers cases of $Y < 0.5$ . Just like in a standard XY plot, the superimposed 2 × 2 table visualizes differences in kind between cases. The upper right quadrant displays our robust case types that are members of both $min T S$ and $I S$ (typical [Y] or deviant consistency [ $\sim Y$ ] cases). The lower left quadrant displays our extreme deviant coverage (Y) and irrelevant ( $\sim Y$ ) cases that are nonmembers of both $max T S$ and $I S$ . Of primary concern for robustness are the two gray shaded quadrants in Figure 4. They indicate cases that are covered by the IS only (lower-right quadrant) or the maxTS only (upper-left quadrant), that is our “shaky” and “possible” typical and deviant cases.

Figure 4.

The subset relationship between different solution formulas.

If all cases are exactly on the diagonal, then $I S$ , $min T S$ , and $max T S$ perfectly overlap according to the relevant case types for $min T S$ or $max T S$ , respectively. Our fit-oriented parameters would yield values close to their maximum value of 1 and our case-oriented parameters would be exactly at their maximum value of 1.¹³ If all cases are above the diagonal, then $I S$ is a perfect subset of $max T S$ . This, for instance, would happen if the robustness tests exclusively consisted of lowering the raw consistency threshold in the case of the conservative solution, therefore including more truth table rows and, consequently, covering more cases. In this scenario, all the relevant nonrobust cases are possible cases that were not initially covered by the $I S$ .¹⁴ $R F_{c o n s}$ and $R F_{c o v}$ would again yield values of (or close to) 1, indicating that $I S = R C$ , and hence that all of the cases covered by the $I S$ are robust. However, since the $max T S$ would not perfectly coincide with $I S$ , the $R F_{S C_max T S}$ and (at least one of) the robust case ratio (RCR) parameters will yield values smaller than 1. This is because there would be some either possible typical ( $R C R_{t y p} < 1$ ) or deviant consistency cases ( $R C R_{c o n s} < 1$ ), or both. The value of $R F_{S C_min T S}$ is not predictable based on these types of cases.

Alternatively, if all cases are below the diagonal, then $I S$ is a superset of $min T S$ . In this situation, all relevant nonrobust cases are shaky cases. In other words, the IS covers cases that are nonrobust. $R F_{c o n s}$ , $R F_{c o v}$ , and $R F_{S C_min T S}$ would all yield values smaller than 1. This happens because neither does $I S$ perfectly coincide with the $R C$ , as the $I S$ includes nonrobust cases (hence, $R F_{c o n s}$ and $R F_{c o v}$ are less than 1), nor does it perfectly coincide with $min T S$ (hence, $R F_{S C_min T S}$ is less than 1). The value of $R F_{S C_max T S}$ is not predictable based on these types of cases.

In general, the farther away cases are from the diagonal, the smaller at least some of our RF parameters are.¹⁵ The more cases are in the “forbidden quadrants” (upper-left and lower-right), the lower are our case-oriented parameters. Moreover, if there are cases in both of these areas simultaneously, it means that the $I S$ contains shaky cases and omits possible cases.

The literature on robustness in QCA puts emphasis on the subset relations between solutions and argues that solutions that are in a subset relation ought to be considered as more robust than those that are not. We further specify this subset notion of QCA solutions into a robustness case rank $R C C_R a n k$ that classifies the relation between the $I S$ and the $min / max T S$ into four scenarios (see formula 7). Based on our classification of cases, only some cases above and below the diagonal are relevant: the shaky and the possible case types. A violation of the subset relation between $I S$ and $T S$ exists when there are both shaky and possible cases. We therefore classify this situation as the worst ( $R C C_R a n k = 4$ ). Conversely, when there are no shaky and no possible case types $R C C_R a n k$ takes the value of 1, indicating the best possible scenario of the subset relation between the $T S$ and the $I S$ according to relevant case types. $R C C_R a n k$ is 2 when there are no shaky cases and 3 when there are no possible cases. We consider shaky cases a worse violation of robustness than possible cases due to the fact that these are unrobust cases that are part of the IS for which robustness is evaluated.

R C C_R a n k = \{\begin{array}{l} 1, if \sum shaky cases (min TS) =0 AND \sum possible cases (max TS) = 0 \\ 2, if \sum shaky cases (min TS) =0 AND \sum possible cases (max TS) \neq 0 \\ 3, if \sum shaky cases (min TS) \neq 0 AND \sum possible cases (max TS) = 0 \\ 4, if \sum shaky cases (min TS) \neq 0 AND \sum possible cases (max TS) \neq 0 \end{array} .

In sum, our case-oriented perspective resonates both with QCA’s case-based origin and a focus on correct case classifications, which QCA has in common with a wide array of statistical and machine-learning procedures (Muchlinski, Siroky, and Kocher 2016). Both theory and practice in QCA robustness tests has long ignored this vital component.

The QCA Robustness Test Protocol in a Nutshell

Applied QCA researchers have a total of seven parameters at their disposal. These parameters and their meaning are summarized in Table 1. Four of the parameters approach robustness from the perspective of parameters of fit and three from the perspective of cases and their classification as typical or deviant. The first six of them indicate maximum robustness with a value of 1 and lower robustness with values lower than 1. A seventh, the $R C C_R a n k$ , ranks robustness scenarios into ranks from 1 to 4, from best to worst. In principle, researchers hope that all parameters reach the maximum or at least come close to it. In practice, this might often not be the case.

Table 1.

Qualitative Comparative Analysis Robustness Formulas.

Parameter	Meaning
Fit-oriented parameters
$R F_{c o n s}$	Is $I S$ fully consistent with $R C$ ?
$R F_{c o v}$	Does $I S$ cover the same as $R C$ ?
$R F_{S C_min T S}$	Does $I S$ coincide with $min T S$ ?
$R F_{S C_max T S}$	Does $I S$ coincide with $max T S$ ?
Case-oriented parameters
$R C R_{t y p}$	Are all typical cases robust?
$R C R_{d e v}$	Are all deviant consistency in kind cases robust?
$R C C_R a n k$	Do case classifications not violate subset relations with
	$min T S$ (shaky) and $max T S$ (possible)?

Note: IS = initial solution; RC = robust core; TS = test set; RF = robustness fit; RCR = robustness case ratio; RCC = robustness case classifications.

We now have all tools at our disposal to integrate our conceptualization and operationalization of QCA robustness tests in the form of a protocol. The following steps should be followed by applied QCA researchers who aim at testing the robustness of their findings. This protocol enables researchers to test the robustness of their findings against multiple alternative analytic decisions (calibration, raw consistency threshold, and frequency cut), both individually (using sensitivity ranges) and in combination (using the $T S$ and the $R C$ ), for both crisp and fuzzy set data.¹⁶ Additionally, the protocol allows for the evaluation of robustness in a multifaceted way by analyzing the consequences of alternative analytic decisions from both a fit perspective and a case perspective. Researchers can thus provide a nuanced answer to the question how robust their findings are if they follow the protocol:

Produce $I S$ .

Determine the sensitivity ranges for all relevant analytic decisions in isolation.

Produce alternative solutions for the various analytic changes considered. In practice, we advice researchers to attempt to build tests that are as challenging as conceptually plausible by:

• varying parameters not only within sensitivity ranges but at the margins of the “hard test range” of conceptual plausibility (see Figure 1);

• building alternative solutions by combining the analytic changes in this hard test range.

Obtain the $T S$ and the $R C$ .

• Intersect all alternative solutions in $T S$ to obtain $min T S$ .

• Create the union of all alternative solutions in $T S$ to obtain $max T S$ .

• Intersect $I S$ and all alternative solutions in the $T S$ in order to obtain the $R C$ .

Calculate the fit-oriented parameters ( $R F$ ) to evaluate the overlap between $I S$ , $R C$ , and $min T S / max T S$ .

Calculate the case-oriented robustness parameters ( $R C R$ ), identify robustness-relevant types of cases, and evaluate the robustness case rank ( $R C C_R a n k$ ) in which the relation between the $I S$ and $min T S / max T S$ is situated.

Interpret the robustness results (including identifying the “hardest” test solution used).

The QCA Robustness Test Protocol in Practice

To illustrate how our seven-step QCA Robustness Test Protocol is put in practice using the R package SetMethods, v.3.0 (Oana/Schneider 2018), we use the study by Paykani et al. (2018) on the conditions for explaining high life expectancy in 131 countries around the globe. They use fuzzy-set data on the conditions high quality education ( $H E$ ), good governance ( $G G$ ), affluent health system ( $A H$ ), high income inequality ( $H I$ ), and high wealth ( $H W$ ) for explaining the outcome set of high life expectancy ( $H L$ ).¹⁷

Step 1: Produce the IS

For the purpose of this illustration, we create an initial parsimonious solution using a raw consistency threshold of $i n c l . c u t = 0.87$ and a frequency cut of $n . c u t = 2$ , that is truth table rows with less than two cases are considered logical remainders. The initial parsimonious solution ( $I S$ ) indicates that having $G G$ and an $A H$ or having $H E$ , not $G G$ , $H I$ , and $H W$ is sufficient for having a HL ( $G G \times A H + H E \times \sim G G \times H I \times H W \to H L$ ).¹⁸

Step 2: Determine the Sensitivity Ranges

Sensitivity ranges to calibration can be calculated using the function rob.calibrange(). This function changes the calibration of the condition indicated in the option test.cond.raw, initially calibrated with the thresholds in test.thresholds, by modifying it once at a time with the value indicated in option step. The goal is to find the upper and lower bounds within which the solution stays the same, while keeping constant all other parameters that produced the $I S$ .¹⁹ The function calculates a separate lower bound and a separate upper bound for each of the three anchors by adding the step values in several runs indicated by the option max.runs.²⁰

rob.calibrange(raw.data = PAYR,

calib.data = PF,

test.cond.raw = "WEAL",

test.cond.calib = "HW",

test.thresholds =c(3000,10500,28500),

type = "fuzzy",

step = 500,

max.runs = 40,

outcome = "HL",

conditions = c("HE","GG","AH","HI","HW"),

incl.cut = 0.87,

n.cut = 2,

include = "?")

Exclusion: Lower bound 3000 Threshold 3000 Upper bound 5500

Crossover: Lower bound 10000 Threshold 10500 Upper bound 11000

Inclusion: Lower bound 10500 Threshold 28500 Upper bound NA

We see that for condition “ $H W$ ”, the 0.5 crossover anchor can be located anywhere between $10.000$ (lower bound) and $11.000$ (upper bound) without changing the $I S$ . As indicated in the output with “NA”, sometimes the function cannot find the lower and/or upper bound in the number of runs indicated by max.runs. Therefore, when an “NA” is reported, this could mean that the number of runs initially selected was too low and increasing the maximum number of runs can help in eliminating these unknown bounds. For example, for the qualitative anchor of 1, we only manage to identify a lower bound (at $10.500$ ). For the upper bound, the software tested values up until the $48.500$ but the $I S$ still remained unchanged.²¹ If enough runs were allowed to exceed the empirically available range of the data, then “NA” means that with the data at hand no (upper and/or lower) bound can be identified, that is, any value in the data is within the empirical bound. Generally, if the output still reports an “NA” after increasing the number of runs considerably, this can be taken as evidence of a large sensitivity range.

Functions rob.inclrange() and rob.ncutrange() are used for finding similar ranges for the raw consistency threshold and for the frequency cut, respectively, by gradually modifying them with a step value. We learn that our IS is rather sensitive to changes in raw consistency threshold and frequency (n.cut) threshold, respectively. The $I S$ stays the same only for raw consistency values between 0.85 and 0.87, whereas any frequency threshold other than two produces changes to our IS.

rob.inclrange(data = PF,

step = 0.01,

max.runs = 20,

outcome = "HL",

conditions = c("HE","GG","AH","HI","HW"),

incl.cut = 0.87,

n.cut = 2,

include = "?")

Raw Consistency T.: Lower bound 0.85 Threshold 0.87

Upper bound 0.87

rob.ncutrange(data = PF,

step = 1,

max.runs = 20,

outcome = "HL",

conditions = c("HE","GG","AH","HI","HW"),

incl.cut = 0.87,

n.cut = 2,

include = "?")

N.Cut: Lower bound 2 Threshold 2 Upper bound 2

Step 3: Produce Alternative Solutions, Taking Into Consideration the Sensitivity Range Analysis and Conceptually Plausible Changes in the Hard Test Range

For analyzing how multiple changes affect the IS and implementing the notion of RC, we first create, say, three additional solutions. In each of these additional solutions, we alter the analytic decision made for the IS. In alternative test solution 1 ( $T S 1$ ), we lower the raw consistency threshold from 0.85 to 0.75; in test solution 2 ( $T S 2$ ), we change the calibration anchors for the set $H W$ to 1,000 (0-anchor), 9,000 (0.5-anchor), and 37,000 (1-anchor); and in test solution 3 ( $T S 3$ ), we combine a lower frequency cut from two cases to one case with the change of calibration anchors for the set $H W$ as before.^22,23

Note that we have set the test parameters, such that all of them are out of the sensitivity ranges determined in the previous step so that they are all in the hard test range.²⁴ Additionally, we argue that in order to test robustness in a challenging manner, one should test parameters that are as different as possible, but within conceptually plausible ranges. For example, our choice for a different raw consistency threshold is far outside the sensitivity range found in step 2 above (0.85–0.87), but still abides to the general guideline of not having a threshold below 0.75. In other words, robustness tests should be set up, such that they are as hard to pass as conceptually plausible.²⁵

Step 4: Obtain the $T S$ and the $R C$

In our example, we have produced three alternative solutions. One of the advantages of our approach to robustness is, however, that the number of these alternative solutions can be arbitrarily high. For practical purposes, in order to obtain the $T S$ , we need to create an object for our test solutions $T S 1$ , $T S 2$ , and $T S 3$ by binding all of them into a list.²⁶

TS <- list(TS1, TS2, TS3)

We now have the $I S$ and the $T S$ . Both are needed in order to obtain the $R C$ , which is defined as the overlap, or intersection, of $I S$ and all solutions in the $T S$ , hence, $min T S$ . We can examine the $R C$ and its parameters of fit using function rob.corefit() implemented in the SetMethods package. The parameters of fit of the $R C$ and the $I S$ are different. This means that the overlap between them and thus the robustness of the $I S$ to the changes tested is less than perfect. As expected, the consistency of the $R C$ is higher and its coverage lower than that of the $I S$ . This is because, by definition, the $R C$ can only be either identical to the $I S$ (perfect robustness) or a subset of the $I S$ (see Figure 2 for a straightforward visualization of this point). Additionally, researchers can obtain the Boolean expression for the $R C$ by simply intersecting the $I S$ with all alternative solutions using the intersection function in package admisc (Dusa 2020). Beyond parameters of fit, one can also compare the Boolean expression of the $I S$ with that of the $R C$ in order to check which terms of the $I S$ are more robust than others and what changes to these terms are reflected in the $R C$ .

# Obtain parameters of fit for RC:

rob.corefit(test_sol = TS,

initial_sol = IS,

outcome = "HL")

Cons.Suf Cov.Suf PRI

Core.Fit 0.908 0.755 0.874

# Obtain Boolean expression for RC:

intersection(IS, TS1, TS2, TS3)

E1: (GG*AH + HE*^∼GG*HI*HW)(HE + ^∼GG*HW)(GG*AH + AH*HI)

(HE*GG*AH + HE*AH*HI)

I1: HE*GG*AH + HE*^∼GG*AH*HI*HW

Step 5: Calculate the RF Parameters

By comparing the parameters of fit for $I S$ with that for $R C$ , we already notice that robustness is less than perfect. Our RF parameters— $R F_{c o v}$ , $R F_{c o n s}$ , $R F_{S C_min T S}$ , and $R F_{S C_max T S}$ —can be used to numerically express robustness understood as the size of the overlap between $I S$ , $R C$ , $min T S$ , and $max T S$ , with values of 1 indicating perfect robustness. In order to obtain these parameters, we use function rob.fit() in the SetMethods package. Users need to specify the list of test solutions, the IS, and the outcome under investigation.

rob.fit(test_sol = TS,

initial_sol = IS,

outcome = "HL")

RF_cov RF_cons RF_SC_minTS RF_SC_maxTS

Robustness_Fit 0.917 0.966 0.883 0.752

The parameters of $R F_{c o v}$ , $R F_{c o n s}$ , $R F_{S C_min T S}$ , and $R F_{S C_max T S}$ are all lower than 1, but close to it. This means that the overlaps of neither $I S$ and $min T S / max T S$ nor $I S$ and $R C$ are perfect ( $R F_{c o v} < 1$ and $R F_{c o n s} < 1$ ), but they are rather high. All in all, this means that from a parameters of fit perspective, the $I S$ displays quite high robustness to the changes tested against. This conclusion is not in contradiction to the above finding of high sensitivity to only minor changes to the raw consistency and frequency thresholds. For the latter, the strict criterion of no change of the Boolean expression is applied, whereas here such changes are allowed and evaluated as to how much difference such changes make regarding consistency, coverage, and set coincidence.

Step 6: Identify Robustness-relevant Types of Cases and the RCRs

As explained above, fit-oriented robustness should be complemented with a case-oriented perspective. There are two main, mutually nonexclusive, ways of implementing case-oriented robustness in practice. First, we can produce an XY plot with the $min T S / max T S$ and the $I S$ to obtain an initial visualization of the different types of cases, as well as their membership in the outcome Y. Second, we can list which cases fall into which of the different robustness-relevant types of cases (robust, shaky, and possible) that we introduced in the case-oriented robustness section and visualized in Figure 3. Based on this information, we calculate the ratios of robust and nonrobust cases ( $R C R$ ), as well as identify the case rank of the relation between $min T S / max T S$ and $I S$ ( $R C C_R a n k$ ).

For plotting the $T S$ against the $I S$ , we use the function rob.xyplot(). Looking at the XY plot in Figure 5, we see that several cases are in the upper-right and lower-left cells. As explained, these are robust cases—either typical (Y) or deviant consistency cases ( $\sim Y$ ) in the upper right quadrant, or extreme deviant coverage (Y) or irrelevant cases ( $\sim Y$ ) in the lower left quadrant. Having many of the cases in these two quadrants means that the classification of cases is robust to the changes tested against. There are, however, also several cases in the upper-left quadrant. These are possible cases included in the $max T S$ , but not in the $I S$ . Some of these cases are possible typical (e.g., TUN, IRN, CRI) and others are possible deviant cases (e.g., PHL, AZE, RUS). Additionally, we have several cases in the lower-right quadrant. They are covered by the $I S$ but not the $min T S$ . One of these cases, SRB, is member of the outcome (as indicated by the full circle). This makes it a shaky typical case. The other cases in the quadrant (MNG, GEO, COL, and THA) are not members of the outcome, which makes them shaky deviant consistency cases. This means that by producing alternative solutions, our IS might “loose” one typical case and several deviant consistency cases.

One of the main advantages of visualizing robustness using XY plots is that we can easily spot the case rank of the relation between $T S$ and $I S$ according to relevant case types. Remember that for evaluating the case rank of the relation between $I S$ and $T S$ with respect to robustness, we do not take into account robust cases (upper-right quadrant), extreme deviant coverage cases, and irrelevant cases (lower-left quadrant) as these are irrelevant from our case-oriented robustness perspective.²⁷ In our example, we see that since there are cases in both the lower-right and the upper-left quadrant, $T S$ and $I S$ are situated in the worst case rank, $R C C_R a n k = 4$ .

rob.xyplot(test_sol = TS,

initial_sol = IS,

outcome = "HL",

all_labels = FALSE,

fontsize = 3,

jitter = TRUE,

area_lab = TRUE)

Figure 5.

The relation between minimal test set, maximal test set, and the initial solution.

Function rob.cases() identifies which cases fall under which robustness-relevant case type. In addition, the function automatically reports the $R C R s$ parameters for typical and deviant consistency cases.²⁸

rob.cases(test_sol = TS,

initial_sol = sol1,

outcome = "HL")

$CaseParameters

RCR_typ RCR_dev RCC_Rank

Robustness_Case_Ratio 0.841 0.273 4

$CaseTypes

Robust Typical Cases (IS*TS and Y > 0.5):

-------------------

Boolean Expression: GG*AH*HE + HE*^∼GG*HI*HW*AH

Cases in the intersection/Total number of cases: 37/131 = 28.24%

Cases in the intersection/Total number of cases Y > 0.5: 37/52 =71.15%

Case Names:

ARG AUS AUT BRB BEL CAN HRV CYP CZE DNK EST FIN FRA DEU GRC HUN ISL

IRL ISR ITA JPN KOR LUX MLT MNE NLD NOR POL PRT SVK SVN ESP SWE CHE

GBR USA URY

We see that $R C R_{t y p}$ is $0.841$ . This means that out of all potential typical cases, $84.1$ percent are robust. $R C R_{d e v}$ on the other hand is $0.273$ . This means that out of all potential deviant cases consistency, only $27.3$ percent are robust. This is in line with the information we obtained from Figure 5 where we can see that we have very few robust deviant consistency cases, but many possible and shaky deviant consistency cases. Additionally, the output provides information on how many cases of a certain type there are out of all the cases in the analysis and out of all the cases that share the same qualitative membership in the outcome. This provides researchers with more detailed information on the raw number and share of a particular case type beyond the case ratios. For example, we see that we have $37$ robust typical cases which amount to $28.24$ percent of the cases in the analysis and to $71.15$ percent of all the cases that are members of the outcome. In other words, out of all the cases that have the outcome to be explained, almost two thirds are robust typical. Finally, the names of all cases are listed according to their type (here we only report the robust typical cases due to space considerations). This is particularly useful when there are many and their graphical depiction in an XY plot is less feasible.

Step 7: Interpret the Robustness Results

The information revealed by our QCA Robustness Test Protocol and the various parameters it produces shows that while our $I S$ is relatively sensitive to modifications in calibration threshold for condition $H W$ , to raw consistency threshold, and to frequency cut, it is relatively robust in terms of fit. A large part of the $I S$ is robust (values close to 1 in $R F_{c o v}$ and $R F_{c o n s}$ ) and the $I S$ overlaps to quite a large extent with the alternative solutions created (high values in $R F_{S C_min T S}$ and $R F_{S C_max T S}$ ). When it comes to cases, however, our solution is subject to both “shaky” cases and “possible” cases, placing it in $R C C_R a n k 4$ . Looking at the case ratios, we see that it is the deviant consistency in kind cases that are most problematic, as very few of these are robust. Only six of these case are robust, whereas four of them are shaky and 12 are classified as possible.²⁹

In order to further disaggregate these results, our protocol also allows for checking which alternative test solution turned out to be most problematic. This can provide insights into further analysis and/or further conceptual work by identifying those specific analytic changes that are most consequential for robustness. Function rob.singletest() provides this information by reporting the set coincidence between each individual test solution and the $I S$ (for evaluating fit) and the $R_{C R a n k}$ between these solutions and the $I S$ . We see that $T S 1$ is listed first and thus identified as the worst for robustness. This solution was produced by lowering raw consistency from 0.85 to 0.75. It leads to having both “possible” and “shaky” cases ( $R C C_R a n k = 4$ ) and has the lowest set coincidence with the $I S$ ( $S C = 0.752$ ).

rob.singletest(test_sol = TS,

initial_sol = IS,

outcome = "HL")

Model RCC_Rank SC

1 HE + ^∼GG*HW 4 0.752

2 GG*AH + AH*HI 4 0.887

3 HE*GG*AH + HE*AH*HI 3 0.883

Applied QCA researchers should provide information on their robustness tests in a concise manner. While further details can be provided in an Online Appendix, the summary of robustness tests should be shown in the main text, if possible. We provide here a robustness protocol report table (Table 2) as one way of displaying such a summary about the robustness tests.

Table 2.

Robustness Protocol Report.

Sensitivity Ranges
	Condition	0	0.5	1
Calibration anchors	COND 1	Lower:	Lower:	Lower:
		Upper:	Upper:	Upper:
	COND 2	…	…	…
Parameters	Raw consistency	Lower:	Threshold:	Upper:
	Frequency	Lower:	Threshold:	Upper:
Robustness parameters
Fit oriented	$R F_{c o n s}$ :	$R F_{c o v}$ :	$R F_{S C_min T S}$ :	$R F_{S C_max T S}$ :
Case oriented	$R C R_{t y p}$ :	$R C R_{d e v}$ :	$R C C_R a n k$ :
Worst performing model
Model:

Finally, beyond the standard steps in the robustness test protocol presented above, one could also further engage with the notions introduced here in order to produce various improvements to the $I S$ . For example, if the $R C$ has an acceptable coverage, one could decide to focus their interpretation of the QCA results on the $R C$ alone at the end of the protocol in order to avoid shaky and possible cases. Alternatively, if the $I S$ has a small coverage, one could check which specific analytic changes lead to the emergence of possible typical cases and implement that change in an improved QCA solution that has a larger coverage - only, of course, if such changes fall within the conceptually plausible range.

Concluding Remarks

When conceptual borders are easily definable, when cases cluster neatly in truth table rows, or when the number of cases in the analysis is small, analytic decisions related to calibration or to the truth table analysis can be straightforward. In applied QCA, though, this is rarely the case. Conceptual borders are often imprecise and, consequently, the exact placement of calibration anchors open to debate. Cases are sometimes numerous and do not neatly cluster in specific truth table rows. This often makes decisions on raw consistency threshold and frequency cuts ambiguous. In such situations, we argue that researchers should apply systematic robustness tests to assess the consequences of changes in their various analytic decisions.

In this article, we have made proposals on how to move forward the discussion on both principles and practices of robustness tests in QCA. We believe that robustness is a multifaceted concept that needs to be assessed from various complementary angles. We have introduced the notion of sensitivity range as that range of values for calibration anchors, raw consistency, and frequency cutoffs within which our QCA result does not change. Going beyond sensitivity ranges, we have proposed various numerical robustness parameters that express how much and in which way QCA results change. We have defined different types of cases whose existence and relative frequencies provide additional information on how robust our QCA results are. Taken together, sensitivity ranges, the fit-oriented, and the case-oriented perspectives provide an integrated QCA Robustness Test Protocol, for both crisp and fuzzy set data, that is implemented in the R software package SetMethods.

While the tools introduced here allow for the implementation of this robustness protocol in a straightforward manner, we envisage several developments. As it stands now, the protocol introduced here assumes that cases’ membership in the outcome remains unchanged in the identification of robust/possible/shaky typical and deviant cases and in the calculation of parameters of fit. Future developments could overcome these limitations related to outcome recalibration. Future developments could also focus on ways to expand the robustness protocol to situations in which changes to the selection of cases and of conditions are made. For such changes, it should be kept in mind, though, that they tend to pertain to the domain of QCA as an approach rather than as a technique because they constitute more fundamental changes to the research design of a QCA.

Finally, we argue that any future developments should not abuse the more automated approach to robustness that our functions in package SetMethods undoubtedly provides. Instances of such abuse would consist in letting the algorithm identify the optimal thresholds for calibration, raw consistency, or row frequency, whereby optimal would be defined as “most robust.” This is why we suggest that the best practice should be to not only first provide a QCA solution (which here we have labeled IS) but to also specify an expected substantive plausibility range derived from theoretical and conceptual considerations. This recommendation is particularly salient for the calibration range because the location of the calibration anchors should be strongly driven by conceptual considerations on the meaning of the set to be calibrated.

All in all, we believe it important that the performance and reporting of robustness test become standard in QCA. With the conceptual and computational tools presented in this article, there are fewer reasons to object to this emerging standard of good practice.

Supplemental Material

Supplemental Material, sj-pdf-1-smr-10.1177_00491241211036158 - A Robustness Test Protocol for Applied QCA: Theory and R Software Application

Supplemental Material, sj-pdf-1-smr-10.1177_00491241211036158 for A Robustness Test Protocol for Applied QCA: Theory and R Software Application by Ioana-Elena Oana and Carsten Q. Schneider in Sociological Methods & Research

Footnotes

Authors’ Note

Rotation principle: both authors contributed equally. This article has benefited from fruitful feedback provided by participants of the second International Qualitative Comparative Analysis Expert Workshop in Antwerp 2019 and of the ECPR Methods Schools in Budapest and Bamberg in 2018 and 2019 and virtual in 2020. We also thank the anonymous reviewers whose comments have improved our arguments in decisive ways.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Ioana-Elena Oana

Supplemental Material

The supplemental material for this article is available online at .

Notes

References

Arel-Bundock

. 2019. “The Double Bind of Qualitative Comparative Analysis.” Sociological Methods & Research. DOI:10.1177/0049124119882460.

Baumgartner

Thiem

. 2017. “Often Trusted But Never (Properly) Tested: Evaluating Qualitative Comparative Analysis.” Sociological Methods & Research 49:279–311.

Braumoeller

B. F.

2015. “Guarding against False Positives in Qualitative Comparative Analysis.” Political Analysis 23(4):471–87.

Cooper

Glaesser

. 2015. “Exploring the Robustness of set Theoretic Findings from a Large n fsQCA: An Illustration from the Sociology of Education.” International Journal of Social Research Methodology 19(4):445–59.

Dusa

2018. QCA with R. Berlin, Germany: Springer.

Dusa

2019. “Critical Tension: Sufficiency and Parsimony in QCA.” Sociological Methods & Research. DOI:10.13140/RG.2.2.34374.32325.

Dusa

2020. “Admisc: Adrian dusa’s Miscellaneous.” https://CRAN.R-project.org/package=admisc. R package version 0.11.

Emmenegger

Schraff

Walter

. 2014. “QCA, the Truth Table Analysis and Large-N Survey Data: The Benefits of Calibration and the Importance of Robustness Tests.” COMPASSS Working Paper, 2014-79. http://www.compasss.org/wpseries/EmmeneggerSchraffWalter2014.pdf.

Gibson

Burrell

. 2018. “BRAQCA: Bootstrapped Robustness Assessment for Qualitative Comparative Analysis.” https://cran.r-project.org/web/packages/braQCA/. R package version 1.0.0.1.

10.

Hofstad

2019. “QCA & the Robustness Range of Calibration Thresholds: How Sensitive Are Solution Terms to Changing Calibrations?” COMPASSS Working Paper, 2019-92. http://www.compasss.org/wpseries/Hofstad2019.pdf.

11.

Hug

2013. “Qualitative Comparative Analysis: How Inductive Use and Measurement Error Lead to Problematic Inference.” Political Analysis 21(2):252–65.

12.

Krogslund

Choi

D. D.

Poertner

. 2015. “Fuzzy Sets on Shaky Ground: Parametric and Specification Sensitivity in fsQCA.” Political Analysis 23(1):21–41.

13.

Lucas

S. R.

Szatrowski

. 2014. “Qualitative Comparative Analysis in Critical Perspective.” Sociological Methodology 44(1):1–79.

14.

Mendel

J. M.

Korjani

M. M.

. 2018. “A New Method for Calibrating the Fuzzy Sets used in FSQCA.” Information Sciences 468:155–71.

15.

Muchlinski

Siroky

Kocher

. 2016. “Comparing Random Forest with Logistic Regression for Predicting Class-imbalanced Civil War Onset Data.” Political Analysis 24(1):87–103.

16.

Oana

I. E.

Schneider

C. Q.

. 2018. “SetMethods: An Add-on R Package for Advanced QCA.” The R Journal 10(1):507–33.

17.

Paykani

Rafiey

Sajjadi

. 2018. “A Fuzzy Set Qualitative Comparative Analysis of 131 Countries: Which Configuration of the Structural Conditions Can Explain Health Better?” International Journal for Equity in Health 17(1).1–13.

18.

Ragin

C. C.

2006. “Set Relations in Social Research: Evaluating Their Consistency and Coverage.” Political Analysis 14(3):291–310.

19.

Ragin

C. C.

Fiss

P. C.

. 2016. Intersectional Inequality. Race, Class, Test Scores, and Poverty. Chicago: University of Chicago Press.

20.

Rohlfing

2015. “Mind the Gap: A Review of Simulation Designs for Qualitative Comparative Analysis.” Research & Politics 2(4):1–4.

21.

Rohlfing

2016. “Why Simulations Are Appropriate for Evaluating Qualitative Comparative Analysis.” Quality & Quantity 50:2073–84.

22.

Rohlfing

2018. “Power and False Negatives in Qualitative Comparative Analysis: Foundations, Simulation and Estimation for Empirical Studies.” Political Analysis 26(1):72–89.

23.

Rutten

2020. “Applying and Assessing Large-n QCA: Causality and Robustness from a Critical Realist Perspective.” Sociological Methods & Research. DOI:10.1177/0049124120914955.

24.

Schneider

C. Q

. 2018. “Realists and Idealists in QCA.” Political Analysis 26(2):246–54.

25.

Schneider

C. Q.

Rohlfing

. 2013. “Combining QCA and Process Tracing in Set-theoretic Multi-method Research.” Sociological Methods and Research 42(4): 559–597.

26.

Schneider

C. Q.

Wagemann

. 2012. Set-theoretic Methods for the Social Sciences: A Guide to Qualitative Comparative Analysis. Cambridge, MA: Cambridge University Press.

27.

Skaaning

S. E.

2011. “Assessing the Robustness of Crisp-set and Fuzzy-set QCA Results.” Sociological Methods & Research 40(2):391–408.

28.

Thiem

Spöhel

Duşa

. 2016. “Enhancing Sensitivity Diagnostics for Qualitative Comparative Analysis: A Combinatorial Approach.” Political Analysis 24(1):104–20.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.09 MB

A Robustness Test Protocol for Applied QCA: Theory and R Software Application

Abstract

Keywords

Sensitivity Ranges

Fit Orientation and Case Orientation on Robustness

The IS, the minTS/maxTS, and the RC

Fit-Oriented Robustness

Case-oriented Robustness

Venn diagrams and types of robustness-relevant cases

XY plots and case-oriented robustness

The QCA Robustness Test Protocol in a Nutshell

The QCA Robustness Test Protocol in Practice

Step 1: Produce the IS

Step 2: Determine the Sensitivity Ranges

Step 3: Produce Alternative Solutions, Taking Into Consideration the Sensitivity Range Analysis and Conceptually Plausible Changes in the Hard Test Range

Step 4: Obtain the T S and the R C

Step 5: Calculate the RF Parameters

Step 6: Identify Robustness-relevant Types of Cases and the RCRs

Step 7: Interpret the Robustness Results

Concluding Remarks

Supplemental Material

Supplemental Material, sj-pdf-1-smr-10.1177_00491241211036158 - A Robustness Test Protocol for Applied QCA: Theory and R Software Application

Footnotes

Authors’ Note

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

Notes

References

Supplementary Material

Step 4: Obtain the $T S$ and the $R C$