Abstract

Many public policy studies (Martin and Scott 2021) use randomized field experiments for drawing causal conclusions (e.g., Chen et al. 2020). A typical randomized field experiment involves a control group and a treatment group to which individual units (e.g., consumers, patients) are randomly assigned, after which an intervention (e.g., a marketing program) is implemented in the treatment group. To assess the efficacy of an intervention, researchers typically estimate the average treatment effect, which is computed as the mean difference in the outcome between the units in the treatment group and the control group. When applying the results of a randomized experiment, it is assumed that the treatment effect within the manipulated condition is the same for all the units assigned to the treatment condition. This may not always be the case, as the treatment might have differential causal effects on different subgroups (subgroup differences). More formally, this variation is called treatment effect heterogeneity. For example, the treatment effect may differ for men and women or for those who have different types of insurance coverage. By accounting for treatment effect heterogeneity, public policy researchers can get a better and more nuanced understanding of the efficacy of an intervention. Specifically, they can ascertain how the treatment effect may also vary across units based on characteristics that are not manipulated in the experiment. We discuss three prominent approaches to account for treatment effect heterogeneity: analysis of variance (ANOVA)/regression with covariates and moderators, a random-coefficients model, and causal forests (see Table 1). We illustrate each approach using a simulated version of the data in Chen et al. (2020). Our goal is to provide a practical understanding of each approach, with a focus on causal forests.
Screening Completion for Liver Cancer: A Stylized Example
People at risk for liver cancer or hepatocellular carcinoma (HCC) should undergo semiannual screening to facilitate early detection, which can be lifesaving. Due to low screening rates, health care institutions invest in patient outreach programs to encourage and increase screening rates among at-risk populations. To preserve confidentiality, we simulated the data for a hypothetical experiment that broadly follows the randomized experiment in Chen et al. (2020).
This article focuses on two randomized conditions: the control group (no-outreach condition [n = 600]) and the treatment group (outreach condition [n = 600]). In the control group, patients received visit-based HCC screening as recommended by primary or specialty care providers and were not contacted by anyone else. In the treatment group (outreach), patients were mailed a one-page letter describing (1) the risk of HCC in patients with cirrhosis, (2) the benefits and risks of HCC screening tests, (3) a summary of the screening procedure, and (4) a recommendation to the patient to make an appointment for an ultrasound.
The dependent variable was simulated as the probability of a patient completing the screening (a continuous variable ranging from 0 to 1 denoting screening probability). The basic dummy-variable regression equation used to estimate the average treatment effect (or main effect) of the outreach intervention is written as
The three approaches described in Table 1 can be used to determine if the benefit of outreach differs for subgroups such as for male versus female patients or patients with different types of insurance coverage. The three approaches are ANOVA/regression with covariates and moderators, random-coefficients model, and causal forests. The goal is to ascertain if the average treatment effect should be adjusted downward or upward in different subgroups. The first two approaches have been widely used in marketing and are described in the Web Appendix.
A Methodological Comparison: ANOVA/Regression, Random-Coefficients Model, and Causal Forests.
This commentary focuses on causal forests, a technique with a different conceptual focus. ANOVA/regression and random-coefficients models are top-down ways to think about treatment effect heterogeneity. They start by comparing the mean of the outcome variable in the treatment group with the mean of the outcome variable in the control group to get the average treatment effect. Next, they uncover how the main effect may vary among subgroups. ANOVA/regression uses interaction terms to dissect the main effect by various subgroups (e.g., gender, insurance type, a combination of gender and insurance type), while the random-coefficients models use a combination of data stacking, random slopes, and interactions to dissect the main effects.
Top-down approaches are feasible and efficient when the researcher is interested in testing how the main effect changes across a relatively small number of moderating conditions (e.g., two to five) or has a priori ideas about the moderators. However, in many field experiments, researchers may have several dozen subgroup variables and a relatively small sample size. For example, the researcher may have more than 100 variables from the electronic medical records and census data based on a patient's geographic location (e.g., household income, retail growth, unemployment rate). Including these covariates as moderators in the regression or random-coefficients model is not feasible as the model will run out of degrees of freedom. Moreover, the researcher may not have any basis to a priori specify theory-driven moderators in the model.
A bottom-up approach such as causal forests uncovers treatment effect heterogeneity with a large number of subgroup variables. This involves obtaining the treatment effect estimate for each unit in the sample and then relating individual-level treatment effects to a variety of covariates to understand how high-treatment-effect individuals differ from low-treatment-effect individuals. Next, we describe the steps involved in a causal forests approach.
Causal Forests: Key Steps
Step 1. Obtaining Individual-Level Treatment Effects
To obtain the treatment effect for each of the 600 patients in the control group, the researcher needs an approach to estimate the lift in screening probability for every patient in the control group if they were instead placed in the treatment group. This is inherently an error-prone prediction, as a patient who was placed in the control group could not have simultaneously been placed in the treatment group. The researcher also needs an approach to estimate the lift in screening probability for every patient in the treatment group as if they were instead placed in the control group. This is again an error-prone prediction, because a patient who was placed in the treatment group could not have simultaneously been placed in the control group.
How can the researcher obtain the treatment effect for every patient in the sample? A starting point is to look at the mean difference in the outcome between the patients in the treatment group and those in the control group within a particular subgroup (e.g., women); that is,
Drawing on this logic, the researcher can define a more fine-grained subgroup for every patient i by using more patient characteristics to split the sample into subgroups. If we have m covariates labeled X1 to Xm, we could write Equation 2 as
The causal forest algorithm addresses this challenge by forming data-driven subgroups. The algorithm splits the data into subgroups that share a similar profile of patient characteristics and uses the within-subgroup treatment effect as the estimate for any patient who belongs to the corresponding subgroup. To reduce bias, it uses one part of the sample to determine the subgroups and the other part to estimate the treatment effects. To reduce variance of the treatment effect estimates, it repeats the procedure over many random draws of the sample and averages the estimates. This algorithm is further developed into generalized random forests, where the trees are not used to compute the treatment effect estimates but to create individual-specific weights (Athey, Tibshirani, and Wager 2019). Concretely, for the set of independent and identically distributed patients, indexed i = 1, …, n, we observe the outcome of interest Yi (screening probability), treatment assignment Wi, and a vector of patient characteristics Xi (e.g., gender and insurance type). The patient-level treatment effect estimate
As an illustration, Figure 1 shows the distribution of the patient-level treatment effect estimates,

Histogram of patient-level treatment effect estimates.
Step 2. Discovering Patterns in Treatment Effects Across Individuals
After collecting these treatment effect estimates, we can conduct a second-stage analysis to study how they vary by patient characteristics using a linear regression of
Sources of Treatment Effect Heterogeneity Based on Doubly Robust Estimates.
*p < .05.
**p < .01.
***p < .001.
Pros and cons of causal forests
A key advantage of causal forests is the ability to uncover individual-level treatment effects with valid confidence intervals. It also systematically detects unexpected heterogeneity without (1) the need for a larger number of experimental conditions, (2) restrictions on the number of covariates or (3) limiting the nature and number of interactions among covariates.
In terms of cons, the output of causal forests is difficult to interpret with respect to the sources of heterogeneity. Causal forests is one of many estimators of heterogeneous treatment effects. Formal guidance for choosing the best estimator in a given context is lacking. This poses two challenges: First, relying on a single method might leave researchers too much freedom to make an arbitrary modeling choice. Second, each method may not perform very well in certain regions of the feature space. Researchers can compare multiple estimators of treatment effect heterogeneity and evaluate whether these estimators agree on the assignment of each individual to gain more confidence in the conclusions (Künzel, Walter, and Sekhon 2019).
Conclusion
Causal forests is an emerging technique that accounts for treatment effect heterogeneity, in addition to an ANOVA/regression or random-coefficients model. As Table 1 shows, each approach is slightly different, and no single approach is perfect for incorporating treatment effect heterogeneity. Our larger hope is that the public policy community embraces emerging approaches such as causal forests to provide more nuanced recommendations to policy makers in field experiments related to domains such as nutrition, educational programs, safety training evaluation, sustainability, and donation behavior, among others.
Supplemental Material
Supplemental Material, sj-pdf-1-ppo-10.1177_07439156211032751 - Treatment Effect Heterogeneity in Randomized Field Experiments: A Methodological Comparison and Public Policy Implications
Supplemental Material, sj-pdf-1-ppo-10.1177_07439156211032751 for Treatment Effect Heterogeneity in Randomized Field Experiments: A Methodological Comparison and Public Policy Implications by Yixing Chen, Shrihari Sridhar and Vikas Mittal in Journal of Public Policy & Marketing
Footnotes
Special Issue Guest Coeditors
Brennan Davis, Dhruv Grewal, and Steve Hamilton
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with reor publication of /or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
