Sage Journals: Discover world-class research

Abstract

Qualitative comparative analysis (QCA) is an empirical research method that has gained some popularity in the social sciences. At the same time, the literature has long been convinced that QCA is prone to committing causal fallacies when confronted with non-causal data. More specifically, beyond a certain case-to-factor ratio, the method is believed to fail in recognizing real data. To reduce that risk, some authors have proposed benchmark tables that put a limit on the number of exogenous factors given a certain number of cases. Many applied researchers looking for methodological guidance have since adhered to these tables. We argue that fears of inferential breakdown in QCA due to an “unfavorable” case-to-factor ratio are without foundation. What is more, we demonstrate that these benchmarks induce more fallacious inferences than they prevent. For valid causal inference, researchers are better off relying on the current state of knowledge in their respective fields.

Introduction

Qualitative comparative analysis (QCA) is a configurational research method that has gained some popularity in the social sciences, particularly in business, management, sociology, and political science (Rihoux et al. 2013; Thiem 2022a; Wagemann et al. 2016). At the same time, the methodological literature on QCA has long been convinced that the method is prone to committing causal fallacies when confronted with non-causal data.¹ In a widely noted symposium, Lieberson (2004:13–14), for instance, had hypothesized about 20 years ago that “[QCA’s] procedures do not rule out the possibility that the observations are all a random matter and/or that none of the causal variables were even measured.” Ten years later, Lucas and Szatrowski (2014:61) noted critically that the method “detects complex causality even though the data are noncausal,” a result that raises “serious doubts about QCA’s ability to avoid certifying spurious causal patterns.”² Such statements and other similar ones have since nurtured grave misgivings that the models inferred by QCA from a set of data would be nothing but algorithmic artefacts.

To minimize the risk of QCA committing causal fallacies, Marx (2010) as well as Marx and Dușa (2011) (the latter hereafter MD) have proposed comprehensive benchmark tables of case-to-factor ratios. For a given number of cases researchers may have in their data, these tables list an upper limit on the number of exogenous factors beyond which the models returned by QCA should not be trusted any more. Many applied researchers looking for methodological guidance on how to conduct a QCA study have since adhered to these tables (e.g., Avdagic 2010; Fagerholm 2016; Ide 2018; Muriaas et al. 2022). In contrast, we argue that such fears of a declining ability of QCA to properly uncover causal structures due to unfavorable case-to-factor ratios are without foundation. This argument is based on two pillars. First, every set of data generated by any proper causal structure can always be duplicated by an identical set of data generated by a purely random process. Demanding that a method abstain from inferences when the data could be non-causal is, therefore, futile. Instead, the expectation must be that a method presents an inference whenever the analyzed data could have been generated by a proper causal structure. The difference appears subtle, but its implications are consequential. Second, yet more importantly, we demonstrate, using extensive simulations and case-based evidence, that adherence to MD’s benchmarks tends to induce more fallacious inferences than it prevents. In other words, if their goal is valid causal inference with QCA, then applied researchers are better off not following MD’s benchmarks but, instead, should rely on the current state of knowledge in their respective field.

The structure of our article is as follows. First, we briefly contextualize and summarize MD’s argument. We then identify two consequential problems in their simulation design. Subsequently, we demonstrate the repercussions for causal inference with QCA when relying on MD’s benchmark tables.

The Context of Marx and Duşa’s Benchmark Tables

In a widely noted symposium comment on the use and utility of QCA, Harvard sociologist Stanley Lieberson (2004:14) had once hypothesized that the method would infer causal claims even from non-causal data, and would, therefore, be of no help to applied researchers interested in analyzing cause-effect relations. In reaction to this assertion, Marx (2010) carried out simulations with up to 50 cases and eight exogenous factors to test whether QCA would really be prone to derive consistent solutions from such data. The results led him to conclude that, given a certain number of cases, there indeed exists an upper limit on the number of exogenous factors that can be safely included in an analysis.³ Beyond this limit, QCA would lose its power to recognize real data.

In a follow-up study, MD then extended the simulations implemented by Marx (2010) to 300 cases and 13 exogenous factors with the goal of establishing more comprehensive benchmark tables. To this end, the authors simulated no fewer than 5,382,000 data sets, 1,500 for each combination of a number of cases between two and 300, and a number of exogenous factors between two and 13.⁴ Based on the results of these simulations, they reconfirmed Marx’s (2010) earlier findings that “[g]iven any particular number of cases, there is a ceiling to the number of conditions which can be included safely in an analysis” (MD 2011:121). For instance, users of QCA are cautioned against including more than four exogenous factors when they do not have more than 13 cases in their data (pp. 114–15). Many applied researchers from areas as diverse as education, management, political science, and sociology have since adhered to MD’s benchmarks when carrying out their empirical analyses (e.g., Avdagic 2010; Boogaerts and Drieskens 2020; Fagerholm 2016; Grauvogel and von Soest 2014; Holvoet and Dewachter, 2013; Ide 2018; Marques and Salavisa 2017; Misangyi 2016; Muriaas et al. 2022; Schneider and Sadowski 2010; Zhu et al. 2019). Somewhat surprisingly, however, no methodological interest in these tables at all has been observable yet. Since their publication more than 10 years ago, they have been accepted as authoritative by course instructors of QCA, by applied users of QCA, and by journal reviewers of QCA manuscripts. The remainder of the present article is devoted to challenging this received wisdom.

Evaluating Marx and Duşa’s Benchmark Tables

Problems in Design

As means of assessment, MD invoke the two related concepts of contradiction and consistency because “contradictions will always occur if the explanatory model is not correctly specified” (2011:104) and “ill-specified models will generate very low consistency results” (p. 110). Thus, they deduce that “[i]f random data results in the absence of contradictions and high levels of consistency it indicates that csQCA is not able to distinguish real from random data” (p. 121). Whether an application of QCA is marked by contradictions is measured by the presence of at least one contradictory row in the truth table (p. 111), whether an application of QCA is beset by the problem of inconsistencies by the ratio of the number of cases in all non-contradictory truth table rows to the total number of cases (p. 113). As a contradiction simply refers to a situation in which the consistency of a truth table row is larger than 0 but smaller than 1, we do not treat contradictions and inconsistencies separately.⁵

Two problems beset the simulation design set up by MD, the first of which is relatively peripheral yet still worth being pointed out, whereas the second one is centrally important. The first problem occurs when the authors argue that “ill-specified models will generate very low consistency results. Consequently, a csQCA analysis on random data should result in low consistency results” (p. 110). This statement has two problematic components. First, it is not true that an ill-specified model will necessarily produce low consistency scores. For instance, when a causally relevant factor that does not share a path to the outcome with any other causally relevant factor that is included in a QCA run is omitted from an analysis, the model is ill-specified but can still show high consistency scores (see Baumgartner and Thiem 2020:294–96; Thiem 2015:728). Second, the decreasing rate of occurrence of contradictions over decreasing diversity index values in MD’s benchmark tables is not a consequence of QCA’s decreasing ability to distinguish causal from non-causal data. Instead, it simply confirms some basic principles of probability theory.

To see this, pay attention to the rule whereby MD (p.126) create their data sets: for a given combination of a number of cases n and a number of factors $k + 1$ , randomly assign the value $0$ or the value $1$ with equal probability $p = 0.5$ along the factors $f_{i}$ , with $i = 1, 2, \dots, k + 1$ , for every case $c_{j}$ , with $j = 1, 2, \dots, n$ , in a data set of dimension $n \times (k + 1)$ . Knowing $k$ and $p$ , the values MD obtain for $n = 2$ , for example, can be easily derived without any simulation or application of QCA via the joint probability of $k + 1$ independent events, where each event represents the probability that $c_{2}$ has identical values on the $k$ exogenous factors as $c_{1}$ but a different value on the endogenous factor $f_{k + 1}$ . The probability of having a contradiction with two exogenous factors is thus simply ${0.5}^{3} = 0.125$ , with three such factors ${0.5}^{4} = 0.0625$ , and so on, or, more generally, $1 / 2^{k + 1}$ .

When going beyond two cases, the formula becomes more complex, but all that is still required is basic probability theory. None of QCA’s defining features, such as truth tables or Boolean minimization, need be involved at any stage.⁶ For interested readers, we have put the technicalities of our argument in Appendix 1 and corresponding replication code is available in the accompanying replication script in Appendix 2. To summarize, what MD (pp. 114–15) propose, having run almost 5,400,000 simulations, is not a table that testifies to the decreasing capability of csQCA to distinguish causal from non-causal data, but a table that confirms that QCA behaves in full accordance with what basic laws of probability theory require. Such behavior can, of course, hardly count against the method. For arguing that QCA is deficient in recognizing real causal structures beyond a certain case-to-factor ratio, some specific procedural or algorithmic component of QCA would have to be proven deficient in some sense.

While the first problem of MD’s simulation design is relatively peripheral, the second is not. It relates to the assumption that QCA should not infer any model from the data when their source could be a random process. As we show in the following, such a requirement is completely futile as every set of data generated by any proper causal structure can always be duplicated by an identical set of purely random data, that is, a set of data that is completely indistinguishable from the set of data a causal structure generates. That being the case, researchers must instead expect a method to always return all those models that could have generated the data. The difference appears subtle, but its implications are consequential.

To illustrate this, we present an experiment whose insights can be reconfirmed using any other parallel set-up. In this experiment, we first randomly generate 100 proper causal structures involving four exogenous factors $A, B, C,$ and $D$ . We then test how many draws of randomly generated data it takes to exactly match the data that each of these 100 causal structures generates.⁷ The results for 15 selected causal structures are shown in the bar chart in Figure 1. The number placed next to the bars indicates the rank the respective causal structure has with regard to the number of draws required to make randomly and structurally generated data match perfectly.

Figure 1.

Number of draws to match randomly and structurally generated data.

Presented are the five structures with the fewest required draws (structures 1–5), the five structures around the median number of draws (structures 48–52), and the five structures with most draws required (structures 96–100). At the lower end, not even 400 were needed to randomly generate data identical with the data generated by the proper causal structure $a b + A B + a C + A c + b d \Leftrightarrow \dots$ , whereas at the upper end it took more than 500,000 draws to randomly generate data identical with the data generated by $b + a c D + A C d \Leftrightarrow \dots$ . Irrespective of the make-up of a given structure, however, it was always possible to randomly produce a set of data indistinguishable from the set of data generated by that structure. Demanding that a method abstains from inferences when the data could have been generated by a random process is therefore tantamount to demanding that the method never infers anything. All researchers can expect from a method of causal inference is that, when there is a causal structure that could have generated some set of data to be analyzed, this structure must be identified by this method, ideally to the maximum extent made possible by the available data.

Consequences for Causal Inference

Worse than the two problems of MD’s simulation design, which merely relate to the criteria under which QCA’s properties should be evaluated, are the repercussions that follow from the eventual application of their benchmark tables in empirical research. That is because the ratios provided in these tables implicitly condition the complexity some independent causal structure can have on the available data, thereby effectively introducing a configurational form of omitted variable bias. To see this, consider the following two scenarios, the first of which has deliberately been set up with extreme parameters to bring out as clearly as possible the gist of the argument.

First, suppose researcher John Doe has access to data for two cases

c_{1}

and

c_{2}

on 30 exogenous factors

X_{1}

X_{30}

that are hypothesized by Doe to be causally relevant to the endogenous factor

Y

. Further suppose that there are no omitted factors causally relevant to

Y

(some factors included in Doe’s data may be causally irrelevant to

Y

, but this possibility is inconsequential). Case

c_{1}

instantiates one conjunction of conditions, whereas

c_{2}

instantiates another conjunction that is different from the conjunction

c_{1}

instantiates only with respect to the values on

X_{10}

and

Y

. More specifically, while

c_{1}

shows the value 1 on

X_{10}

and

Y

c_{2}

shows the value 0 on these factors. The corresponding data to this scenario are shown in Table 1.

Table 1.

Ideal Situation of Configurational Difference-Making.

Case	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{5}$	$X_{6}$	$X_{7}$	$X_{8}$	$X_{9}$	$X_{10}$	$X_{11}$	$X_{. .}$	$X_{29}$	$X_{30}$	Y
$c_{1}$	1	0	1	0	0	1	1	0	1	1	0	..	1	0	1
$c_{2}$	1	0	1	0	0	1	1	0	1	0	0	..	1	0	0

According to MD’s benchmarks, any analysis with fewer than seven cases is futile whatever the number of exogenous factors, and having 30 exogenous factors—13 is the limit in their tables— would require tens of thousands of cases.⁸ Clearly, software is dispensable here; it is visible to the naked eye that $X_{10}$ must be a causally relevant factor because $X_{10}$ is a Boolean difference-maker to $Y$ in an otherwise perfectly homogeneous setting.⁹ Consequently, the full QCA solution is given by $X_{10} Ψ + Φ \Leftrightarrow Y$ , where Ψ and Φ represent the usual placeholders for as yet unknown conjuncts of $X_{10}$ , disjuncts other than $X_{10} Ψ$ , respectively. Additional data would then gradually allow Doe to further complete this causal model, either with respect to additional conjuncts to $X_{10}$ or additional disjuncts. Put differently, data situations that are extremely amenable to causal inference with QCA are ruled out ex ante by MD’s benchmarks as unsuitable. The more pressing question, however, is what happens when a situation in which these benchmarks are applied occurs.

Let us first consider what consequences a limit on the number of factors, given any number of cases Doe may have data on, could have for the inferences Doe presents in published work. Generally, there are two classes of scenarios, one markedly worse than the other: first, Doe may present fewer correct inferences than he could have presented on the basis of the available data but does not present any false inferences; second, Doe may present false inferences and may or may not present some correct inferences that could have been presented based on the available data. In the first group of situations, relevant information is not leveraged. In the second case, not only may information not be leveraged but the inferences that are presented are incompatible with the causal structure being searched for.

As an illustrative example, consider the following situation, where Doe attempts to find the true causal structure

x_{1} X_{2} x_{5} + X_{2} x_{3} + X_{4} \Leftrightarrow Y

, or at least uncover as many parts of it as possible. Usually, researchers will have data on factors that cannot be shown to be causally relevant, data on factors that are causally irrelevant, and they will lack data on factors that are causally relevant. At the very least, they should have data on factors that are causally relevant and can be identified as such in their correct position. Suppose this time Doe has gathered the data on 11 cases

c_{1}

c_{11}

presented in Table 2. As Doe has no access to data on

X_{5}

, he cannot uncover the true causal structure in its entirety. Furthermore, the data include the factor

X_{6}

, which is, however, not relevant to

Y

. At the same time, Doe has data on

X_{1}

X_{2}

X_{3}

and

X_{4}

, all of which matter on at least one causal path to

Y

Table 2.

Example of Realistic Data Set for Use in QCA.

Case	$X_{1}$	$X_{2}$	$X_{3}$	$X_{4}$	$X_{6}$	$Y$
$c_{1}$	0	1	1	0	0	1
$c_{2}$	1	1	1	0	0	0
$c_{3}$	1	1	0	0	0	1
$c_{4}$	1	1	1	1	1	1
$c_{5}$	0	1	1	1	0	1
$c_{6}$	0	0	0	1	0	1
$c_{7}$	0	0	1	0	0	0
$c_{8}$	1	1	0	1	0	1
$c_{9}$	0	0	0	0	0	0
$c_{10}$	1	0	1	0	1	0
$c_{11}$	1	1	1	0	1	0

MD’s benchmarks warn researchers against using more than three exogenous factors with 11 cases. Doe heeds this warning and thus needs to make a choice. He decides to drop $X_{4}$ and $X_{6}$ . What is the consequence? Having excluded $X_{4}$ , QCA cannot now explain case $c_{6}$ , as $c_{6}$ is uniquely explained by $X_{4}$ . At the same time, dropping $X_{6}$ is inconsequential. Thus, QCA will yield the following model as part of its solution: $x_{1} X_{2} + X_{2} x_{3} \Rightarrow Y$ .¹⁰ This model is entirely correct, if not complete, since no part of it contradicts the true causal structure that Doe seeks to uncover.¹¹ What, however, if Doe decides to exclude $X_{2}$ and $X_{4}$ ? In that case, QCA will return $X_{1} x_{3} \Rightarrow Y$ as the only model of its solution, which leads to a complete fallacy. This model contains no correct inference. In other words, the consequences of dropping $X_{2}$ and $X_{4}$ are markedly worse than those of dropping $X_{4}$ and $X_{6}$ . Yet, applied researchers working with empirical data never know which situation pertains in their individual case. They may be lucky or not.

Finally, let us see what happens when Doe decides to disregard MD’s benchmark tables, and to drop none of the factors he has gathered data on. In this case, QCA will produce the model $x_{1} X_{2} + X_{2} x_{3} + X_{4} \Leftrightarrow Y$ as part of its solution. This model is not only entirely correct in its inferences, but it is also as close as Doe could get to the true causal structure on the basis of the data available to him. In future research, he may then be able to collect data on $X_{5}$ and complete the inference.

In summary, Doe has not been put on a wrong path in the first scenario, but disregarding MD’s benchmarks would have led him closer to the truth. In contrast, under the second scenario, QCA is forced to commit a causal fallacy when adhering to MD’s benchmarks, which could have been prevented had Doe decided not to use them. More generally, unless researchers drop causally irrelevant factors only, it seems, inferences based on all potentially relevant factors tend to inferentially outperform inferences based on a proper subset of those factors.

To test whether this hypothesis receives support, we conduct another simulation that generalizes the research situation of John Doe laid out above. In this simulation, we randomly generate 20 causal structures based on six exogenous factors $A$ to $F$ . To mimic situations in which not all of these six factors are indeed causally relevant to the outcome, the requirement is that at least two factors constitute the respective causal structure. Furthermore, the simulation assumes that 13 configurationally distinct cases are available, yielding a diversity index value of $13 / 64 \approx 0.2$ .¹²

As an example, consider a situation in which the true causal structure to be uncovered is given by $A d + C D + F \Leftrightarrow Y$ . Out of all six exogenous factors $A$ to $F$ in Doe’s data, only four $—$ A, $C$ , $D$ , and $F —$ are causally relevant to $Y$ . With 13 cases, MD’s benchmarks allow up to four exogenous factors. Having six factors in the data, Doe may be lucky, chooses $A$ , $C$ , $D,$ and $F$ , and would thus be able to discover the causal structure in its entirety from the 13 cases available. But it could also be that he chooses $B$ , $C$ , $E$ , and $F$ . In that case, Doe would not be able to discover the full structure, but it would be possible to at least discover proper substructures of the following architecture: $C \Leftrightarrow Y$ , $F \Leftrightarrow Y,$ and $C + F \Leftrightarrow Y$ . Alternatively, he may be completely off track with this choice if QCA returned a structure not in agreement with the true causal structure operating behind his data, such as $B c + E f \Leftrightarrow Y$ , for instance.

There are $(\begin{array}{l} 6 \\ 2 \end{array}) = 15$ different possibilities for Doe to drop two out of six factors to reach the maximum number of four factors allowed by MD’s benchmark tables. To see how QCA solutions behave across the full spectrum of possible research scenarios, we analyze all tuples of 1 (6 possibilities), 2 (15), 3 (20), 4 (15), and 5 (6 possibilities) factors that could be dropped by Doe from his data. We thus have $20 ((\begin{array}{l} 6 \\ 0 \end{array}) + (\begin{array}{l} 6 \\ 1 \end{array}) + (\begin{array}{l} 6 \\ 2 \end{array}) + (\begin{array}{l} 6 \\ 3 \end{array}) + (\begin{array}{l} 6 \\ 4 \end{array}) + (\begin{array}{l} 6 \\ 5 \end{array})) = 1,260$ different analytical scenarios in our simulation. With no factor dropped, it would, of course, be possible to recover the entire causal structure, irrespective of how many truly irrelevant factors would have been included by Doe. On the other hand, dropping factors up to the point where a null result always represents the solution guarantees that the solution is correct, but does not help in approaching the true causal structure.¹³ These two sets of analyses thus represent the border cases that form the analytical boundary around all other scenarios.

Figure 2 visualizes the correctness ratios (in percent) for all 20 causal structures across all scenarios. The abscissa lists all 20 randomly generated causal structures (for improved readability, only the antecedent is given) ranked according to their sum total of correctness ratios (increasing), the ordinate the number of exogenous factors included in the set of analyses. The number of four exogenous factors recommended by MD’s benchmark tables is marked in bold. Completely black indicates an average correctness ratio of less than 10%; completely white indicates a correctness ratio of at least 90%. In other words, the brighter the rectangle, the better.

Figure 2.

Correctness ratios, in percent, across 20 different causal structures.

There are a few situations in which it is preferable to choose four instead of three exogenous factors. This is the case, for example, for the causal structures $B + f + a e \Leftrightarrow Y$ and $f + B C D \Leftrightarrow Y$ . But under no circumstances is it preferable to choose four instead of five exogenous factors. In other words, as a strategy, not adhering to MD’s tables is always outperforming adherence, irrespective of the complexity of the causal structure to be recovered. Figures range from a difference in correctness ratios of about 3.3 percentage points for $a b + B e + c D$ up to 30 percentage points for structures $A D + D E + d f + D F$ , $c E + D E$ , and $b F + B f + c E + E f$ . These results provide clear support for our claim that QCA is not losing its ability to distinguish causal from non-causal data beyond a certain case-to-factor-ratio as suggested by MD. In fact, our results show the opposite: It is never preferable to adhere to MD’s benchmark tables if priority lies on generating valid causal inferences. The question that remains for applied researchers, then, is what to do about their data if they have more potentially relevant exogenous factors than MD’s benchmark tables would allow them to include in their analysis. Our answer: nothing. Before concluding, we complement our simulation with a replication of an applied QCA study that has followed MD’s benchmark tables to see what effects a strategy of non-adherence to these benchmarks would have had in this particular case.

In the first QCA-based study published in the American Political Science Review, Muriaas et al. (2022) use QCA to examine the interplay between gendered electoral financing (GEF) and other factors in democratic elections to determine whether these mechanisms help achieve gender balance in national parliaments. The six exogenous factors the authors include are whether GEF is state driven (SD), whether there is a quota in action at the time of GEF implementation (Q), whether there is a PR electoral system in place (PR), whether there is centralized candidate selection (CCS), whether political parties are publicly funded (PF), and whether there is a 15% minimum level of women members of parliament (WMP). The outcome to be explained is success (SUC), which refers to a significant increase in gender balance in parliament following a national election.

Before going into the analysis, the authors mention that “[w]ith 31 observations, only a maximum of six conditions may be included [...]” according to MD’s benchmark tables (Muriaas et al. 2022:504). However, at the same time, they point out that “[t]he six conditions to be considered in the QCA are by no means the only factors [...] that may combine with GEF implementation to enhance gender balance in representation. [...] the omitted factors [...] should be considered in any future study of GEF” (Muriaas et al. 2022:506).

The authors list six omitted factors, but their replication data set includes only two of them: whether GEF was directed toward the candidate or the party (PD) and whether GEF was a party penalty rather than a payment (PP). Given the purpose of our replication, we neither seek to interpret the authors’ findings from a substantive perspective nor do we want to comment on methodological issues unrelated to the core argument of our own article.¹⁴ All we are interested in is whether the two omitted factors would have changed the QCA solution in significant ways had the authors not felt obliged to exclude them. Expression (1) summarizes the authors’ main QCA solution, which comprises five causal paths to success (Muriaas et al. 2022:509)

Q \cdot W M P + Q \cdot PR + Q \cdot p f + s d \cdot P F \cdot w m p + s d \cdot C C S \cdot W M P \Leftrightarrow S U {C .}^{15}

(1)

With the two omitted exogenous factors PD and PP added to the analysis, QCA returns 22 rival models that fit the data equally well. That the number of explanatory models has increased should be no surprise. It is a cost often incurred by a higher number of dimensions of variation (see Baumgartner and Thiem 2017:975–79).¹⁶ However, particularly noticeable across these models is the presence of factor PD—one of the factors dropped by the authors. In 21 models, it features in at least one path to the outcome; in 10 models, in at least two paths. Thus, when compared to the original findings presented in Expression (1), the likelihood that at least one model that not only correctly reflects the truth behind the authors’ data is among those 22, but that also includes the factor PD should have increased. However, since, of course, the true causal structure can never have been known for sure in empirical research, we cannot tell whether the possibly true inference about the factor PD has merely been absent from the original solution in Expression (1), or whether the original solution in Expression (1) is partly or even entirely incorrect. Our aim was only to see how, based on the clear results of our simulations, the findings of Muriaas et al. (2022) would have changed if they had included all factors that they strongly believed to be relevant in explaining a significant increase in parliamentary gender balance. And our results suggested that PD, one of the factors dropped because of adherence to MD’s benchmark tables, represents an important component in explaining significant increases in parliamentary gender balance following national elections. How this result is to be interpreted substantively must be left to the experts in the field of political representation.

Conclusions

Methodological literature has long declared that QCA is prone to committing causal fallacies beyond a certain case-to-factor ratio by failing to recognize real data. Consequently, applied users of QCA have been worried that the explanatory models presented to them would be nothing but algorithmic artefacts. We have shown herein that these worries are misplaced, for two reasons. First, the data generated by any proper causal structure can always be regenerated by a purely random process. Demanding that a method abstain from inferences when the data could be non-causal is, therefore, futile. Instead, the expectation should be that a method presents an inference whenever the analyzed data could have been generated by a proper causal structure.

Second, yet more importantly, we have demonstrated that there is nothing in the algorithmic machinery of QCA that puts an upper limit on the number of exogenous factors given a certain number of cases. Our argument did not imply that low case-to-factor ratios incur no costs whatsoever, but we showed that the inferential capabilities of QCA are not at all threatened in the way suggested by MD. In fact, our extensive simulations suggested the exact opposite: If their goal is valid causal inference with QCA, then applied researchers are better off not following MD’s benchmarks but, instead, relying on the current state of knowledge in their respective field.

Footnotes

Acknowledgments

Previous versions of this article have been presented at the 1st Joint Political Methodology Meeting of the Methods of Political Science Section of the German Political Science Association and the Empirical Methodology Working Group of the Swiss Political Science Association, the 9th Annual Conference of the European Political Science Association, the 14th Conference of the European Sociological Association, and the General Conference of the European Consortium for Political Research. We thank all conference participants whose comments helped improve this article. We also thank the three reviewers at Field Methods for their constructive feedback. The authors acknowledge generous financial support from the Swiss National Science Foundation, grant award number PP00P1 170442. To view appendices and replication material for this article, please visit .

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung, grant award number PP00P1 170442.

ORCID iD

Alrik Thiem

Notes

References

Anderson

1938. The problem of causality. Australasian Journal of Psychology and Philosophy 16:127–42.

Avdagic

2010. When are concerted reforms feasible? Explaining the emergence of social pacts in Western Europe. Comparative Political Studies 43:628–57.

Baumgartner

2015. Parsimony and causality. Quality & Quantity 49:839–56.

Baumgartner

2022. Qualitative comparative analysis and robust sufficiency. Quality & Quantity 56:1939–63.

Baumgartner

Thiem

. 2017. Model ambiguities in configurational comparative research. Sociological Methods & Research 46:954–87.

Baumgartner

Thiem

. 2020. Often trusted but never (properly) tested: Evaluating qualitative comparative analysis. Sociological Methods & Research 49:279–311.

Bergmann

. 2008. An introduction to many-valued and fuzzy logic: Semantics, algebras, and derivation systems. Cambridge: Cambridge University Press.

Boogaerts

Drieskens

. 2020. Lessons from the MENA region: A configurational explanation of the (in)effectiveness of UN security council sanctions between 1991 and 2014. Mediterranean Politics 25:71–95.

Braumoeller

B. F.

2015. Guarding against false positives in qualitative comparative analysis. Political Analysis 23:471–87.

10.

Duşa

Thiem

. 2015. Enhancing the minimization of Boolean and multivalue output functions with eQMC. The Journal of Mathematical Sociology 39:92–108.

11.

Fagerholm

. 2016. Social democratic parties and the rise of ecologism: A comparative analysis of Western Europe. Comparative European Politics 14:547–71.

12.

Grauvogel

von Soest

. 2014. Claims to legitimacy count: Why sanctions fail to instigate democratisation in authoritarian regimes. European Journal of Political Research 53:635–53.

13.

Holvoet

Dewachter

. 2013. Multiple paths to effective national evaluation societies: Evidence from 37 low- and middle-income countries. American Journal of Evaluation 34:519–44.

14.

Hug

2013. Qualitative comparative analysis: How inductive use and measurement error lead to problematic inference. Political Analysis 21:252–65.

15.

Ide

. 2018. Does environmental peacemaking between states work? Insights on cooperative environmental agreements and reconciliation in international rivalries. Journal of Peace Research 55:351–65.

16.

Lieberson

2004. Comments on the use and utility of QCA. Qualitative Methods 2:13–4.

17.

Lucas

S. R.

Szatrowski

. 2014. Qualitative comparative analysis in critical perspective. Sociological Methodology 44:1–79.

18.

Mackie

J. L.

1965. Causes and conditions. American Philosophical Quarterly 2:245–64.

19.

Marques

Salavisa

. 2017. Young people and dualization in Europe: A fuzzy set analysis. Socio Economic Review 15:135–60.

20.

Marx

2010. Crisp-set qualitative comparative analysis (csQCA) and model specification: Benchmarks for future csQCA applications. International Journal of Multiple Research Approaches 4:138–58.

21.

Marx

Dusa

. 2011. Crisp-set qualitative comparative analysis (csQCA), contradictions and consistency benchmarks for model specification. Methodological Innovations Online 6:103–48.

22.

Misangyi

V. F.

2016. Institutional complexity and the meaning of loose coupling: Connecting institutional sayings and (not) doings. Strategic Organization 14:407–40.

23.

Muriaas

Mazur

A. G.

Hoard

. 2022. Payments and penalties for democracy: Gendered electoral financing in action worldwide. American Political Science Review 116:502–15.

24.

Rihoux

Alamos Concha

Bol

Marx

Rezsöhazy

. 2013. From niche to mainstream method? A comprehensive mapping of QCA applications in journal articles from 1984 to 2011. Political Research Quarterly 66:175–84.

25.

Rohlfing

2018. Power and false negatives in qualitative comparative analysis: Foundations, simulation and estimation for empirical studies. Political Analysis 26:72–89.

26.

Schneider

Sadowski

. 2010. The impact of new public management instruments on PhD education. Higher Education 59:543–65.

27.

Thiem

2014. Navigating the complexities of qualitative comparative analysis: Case numbers, necessity relations, and model ambiguities. Evaluation Review 38:487–513.

28.

Thiem

2015. Using qualitative comparative analysis for identifying causal chains in configurational data: A methodological commentary on Baumgartner and Epple (2014). Sociological Methods & Research 44:723–36.

29.

Thiem

2018. Improving the use of qualitative comparative analysis for inferring complex causation in development and planning research. Journal of Water, Sanitation and Hygiene for Development 8:622–31.

30.

Thiem

2022a. Qualitative comparative analysis (QCA). In Handbook of research methods in international relations, eds. Huddleston

R. J.

Jamieson

James

, 607–28. Cheltenham, UK: Edward Elgar.

31.

Thiem

2022b. Beyond the facts: Limited empirical diversity and causal inference in qualitative comparative analysis. Sociological Methods & Research 51:527–40.

32.

Thiem

Baumgartner

. 2016. Modeling causal irrelevance in evaluations of configurational comparative methods. Sociological Methodology 46:345–57.

33.

Thiem

Baumgartner

Bol

. 2016a. Still lost in translation! A correction of three misunderstandings between configurational comparativists and regressional analysts. Comparative Political Studies 49:742–74.

34.

Thiem

Duşa

. 2013. Boolean minimization in social science research: A review of current software for qualitative comparative analysis (QCA). Social Science Computer Review 31:505–21.

35.

Thiem

Mkrtchyan

Haesebrouck

Sanchez

. 2020. Algorithmic bias in social research: A meta-analysis. PLoS ONE 15:e0233625.

36.

Thiem

Spöhel

Duşa

. 2016b. Enhancing sensitivity diagnostics for qualitative comparative analysis: A combinatorial approach. Political Analysis 24:104–20.

37.

Wagemann

Buche

Siewert

M. B.

. 2016. QCA and business research: Work in progress or a consolidated agenda? Journal of Business Research 69:2531–40.

38.

Zhu

Wang

Sun

Müller

. 2019. Influencing factors of horizontal leaders’ role identity in projects: A sequential mixed method approach. International Journal of Project Management 37:582–98.

Case-to-factor Ratios and Model Specification in Qualitative Comparative Analysis

Abstract

Introduction

The Context of Marx and Duşa’s Benchmark Tables

Evaluating Marx and Duşa’s Benchmark Tables

Problems in Design

Consequences for Causal Inference

Conclusions

Footnotes

Acknowledgments

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

References