Abstract
Varenicline has shown promise for treating alcohol use disorder (AUD); however, not everyone will respond to varenicline. Machine-learning methods are well suited to identify treatment responders. In the present study, we examined data from the National Institute on Alcohol Abuse and Alcoholism Clinical Intervention Group multisite clinical trial of varenicline using two machine-learning methods. Baseline characteristics taken from a randomized clinical trial of varenicline were examined as potential moderators of treatment response using qualitative interaction trees (
Keywords
Varenicline is partial agonist of the α4β2 nicotinic acetylcholine receptor. It is approved by the Food and Drug Administration for smoking cessation but has shown promise for the treatment of alcohol use disorder (AUD) because nicotine acetylcholine receptors are implicated in both alcohol and nicotine reward (Lê et al., 2000). Results supporting varencline’s efficacy as an AUD treatment have been mixed. Two preliminary randomized controlled trials (RCTs) found varenicline to be effective at reducing drinking and alcohol craving (Fucito et al., 2011; Mitchell et al., 2012) in individuals who smoke and drink heavily. These results were confirmed by a large multisite RCT, which found reductions in heavy drinking days, drinks per drinking day, and alcohol craving in individuals with an AUD (Litten et al., 2013). However, several RCTs have found no effect of varenicline on drinking outcomes (de Bejczy et al., 2015; Plebani et al., 2013), and a multisite human laboratory investigation found that varenicline did not reduce alcohol cue-induced craving (Miranda et al., 2020). A recent study by our group found that varenicline plus naltrexone was superior to varenicline plus placebo for reducing drinks per drinking day in a sample of heavy-drinking smokers who were treatment seeking for smoking cessation and drinking reduction (Ray et al., 2021).
Although varenicline shows promise, we note that not all individuals will respond to varenicline, suggesting the potential for treatment moderators, much as is the case for other AUD pharmacotherapies. Treatment moderators are variables for which the effect of treatment may vary across the values of the moderator. Note that moderators are not necessarily causing a difference on treatment effect (i.e., intervening on moderators may not change the treatment effect) but are descriptive of populations across which treatment effect may vary. To that end, identifying treatment responders to varenicline represents a promising research area. Falk et al. (2015) conducted an analysis examining 17 potential moderators of varenicline’s efficacy and found four significant moderators: reductions in cigarette smoking, treatment drinking goal, years of regular drinking, and patient age. More recently, using the same multisite trial, Donato et al. (2021) found that in individuals with low-severity AUD, varenicline reduced drinking and improved quality of life relative to placebo, whereas high-severity individuals did not benefit from treatment with varenicline. In sum, moderator analyses to date have suggested that AUD severity, as a multidimensional factor (Donato et al., 2021) or as individual indicators (Falk et al., 2015), may help identify treatment responders to varenicline. However, both of these previous studies used traditional statistical methods such as linear regression and linear mixed models with null hypothesis testing. These methods are limited because they frequently cannot identify complex patterns of moderation that involve multiple variables or allow for many candidate moderators.
An alternate approach to identifying treatment responders relies more heavily on data-driven methods than a priori moderators. Specifically, machine-learning methods have become increasingly popular and are well suited for efforts to identify treatment responders using data-driven tools. One benefit of machine-learning approaches is their focus on prediction accuracy over traditional statistical significance indicated by a
In the field of AUD-treatment development, machine-learning approaches were used in a recent reanalysis of the multisite trial of Gabapentin (Laska et al., 2020). Specifically, this study used a random forest based on prerandomization (i.e., baseline) variables to identify likely responders. The primary report from the clinical trial found no significant treatment effect (Falk et al., 2019). Results from the machine-learning analysis revealed that likely responders had a higher number of heavy-drinking days, lower levels of anxiety and mood symptoms, and higher cognitive and motor impulsivity scores at baseline relative to unlikely responders (Laska et al., 2020). Thus, likely responders to gabapentin may demonstrate higher levels of externalizing symptoms and lower levels of internalizing symptoms compared with unlikely responders. Likewise, Gueorguieva et al. (2015) used tree-based approaches to assess treatment moderators in the COMBINE study and identified individuals with shorter abstinence periods who are not overweight or obese as likely responders to acamprosate.
The application of machine learning to predicting treatment response is in its infancy, and there is no current consensus on best methods (Lipkovich et al., 2017). Regression trees and random forests are excellent at building models with complex interactions, which are helpful for identifying treatment responders because the hypothesis involves an interaction between treatment and other baseline variables. These methods iteratively split participants according to candidate variables to maximize homogeneity in the outcome within groups, maximizing predictive accuracy. However, tree-based approaches can produce very complex models that are difficult to interpret.
An alternative approach involves a regularized linear model, such as least absolute shrinkage and selection operator (lasso) and related methods, including group lasso and elastic net. These models use a penalty such that regression coefficients are shrunk toward zero, resulting in only a subset of predictors included in the model. These models align well with clinical researchers’ understanding of linear regression and allow for more interpretable models. However, they are limited to linear models and require specification of interactions. Applications of the lasso in detecting treatment responders typically allow for only two- or three-way interactions (Ballarini et al., 2018), whereas random forest easily identify nonlinear relationships and complex interactions of any form. Both regression-tree methods and regularization methods show promise for being able to predict treatment response; however, these methods have complementary benefits and limitations. On the basis of the novelty of machine-learning approaches and the pros and cons of various methods, we explore using one of each method (i.e., tree-based methods vs. regularization methods) and compare their relative performance. Although future simulation research must compare the performance of these methods, researchers may also select a method based on practical concerns, such as software implementation, types of results output, and methods for handling missing data. Although this study cannot provide recommendations for which methods will perform best at prediction in applications beyond the current one, the results of this study can speak to these practical concerns.
Toward identifying responders to varenicline and improving the application of machine learning to the prediction of clinical response in AUD, in the present study, we examined data from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) Clinical Intervention Group (NCIG) trial of varenicline using two machine-learning methods. These data-driven approaches differ from the two previous studies of a priori predictors (Donato et al., 2021; Falk et al., 2015) and have the benefit of prioritizing out-of-sample prediction and allowing for data-driven discovery. This study aims to identify treatment responders using a host of clinical and demographic baseline variables and to compare two machine-learning methods. Specifically, we compared (a) the qualitative interaction tree (QUINT; Dusseldorp & Van Mechelen, 2014), which is a specific type of regression tree, and (b) the group lasso interaction net (glinternet; Lim & Hastie, 2015), which is a specific type of group lasso. This comparison seeks to elucidate best practices for AUD clinical-trial treatment-responder investigations.
Transparency and Openness
The studies or analyses reported in this article were not preregistered. Data for this study are available through NIAAA’s NCIG. A material transfer agreement is needed to obtain the data. Furthermore, the materials, code, and other resources developed as part of the research reported for this study can be obtained through reasonable request to the corresponding author. This study involved an analysis of existing data rather than new data collection and did not require additional ethical approval.
Method
Study overview
Data for this secondary analysis were drawn from a 13-week, multisite, Phase 2, double-blind, placebo-controlled, parallel group trial of varenicline (NCT01146 613; Litten et al., 2013). This data set was publicly available to qualified investigators and was obtained by L. A. Ray through a material transfer agreement with the NCIG/NIAAA. Participants were randomly assigned to varenicline or placebo on a 1:1 ratio; random assignment was stratified by trial site and regular smoking, defined as 10 or more or fewer than 10 cigarettes smoked per day for the past week. Varenicline dosage was titrated as follows: A starting dose of 0.5 mg was taken once a day on Days 1 through 3; next, a dose of 0.5 mg was taken twice a day on Days 4 through 7; and the target dose of 1 mg taken twice daily was maintained from Weeks 2 through 13 (for dosage methods, see Litten et al., 2013).
Study population
Randomly assigned study participants were 200 individuals (142 males, 58 females) diagnosed with past-year alcohol dependence according to the DSM
Measures
Primary and secondary study endpoints
In the current analysis, we used the primary and secondary drinking endpoints from the original RCT. The primary efficacy outcome for the trial was percent heavy drinking days (PHDD), defined as 4 or more drinks per day for females and 5 or more drinks per day for males, during the maintenance phase of the trial (Weeks 2–13). Secondary drinking outcomes examined in the current study were drinks per day (DPD), drinks per drinking day (DPDD), and percent very heavy drinking days (PVHDD; 8+/10+ drinks per drinking day for females and males, respectively) during the maintenance phase of the trial. All drinking measures were captured through the Timeline Followback and Form 90 interview methodology and procedures (Miller, 1996; Sobell & Sobell, 1992).
Baseline measures
Candidate predictors were selected according to theoretical and practical concerns. A list of 382 variables were collected from the original trial data. Of these, 240 variables were excluded because of limited theory suggesting their candidacy as potential moderators (e.g., usual occupation). An additional 77 variables were excluded for being collected after baseline. Finally, 29 variables were excluded for having limited variance or low cell frequency (e.g., ethnicity). This resulted in 36 candidate predictors, described below. Summary statistics for all candidate predictors are included in the Supplemental Material available online.
Demographics
Baseline demographic variables were assessed during the initial screening visit. Variables included in the present analysis were sex (male or female), age, race (White, Black or African American, more than one race, or other race), marital status (divorced, legally married, living with partner or cohabiting, never married, separated, widowed), formal education (high school, more than high school), and past 30-day employment (full-time 35+ hours per week, part-time regular hours, part-time irregular hours or day work, military service, homemaker, retired, disabled, student, unemployed).
Alcohol-use variables
The following variables assessed alcohol use severity and history:
Drinking- and treatment-history variables were collected as part of the medical-history exam and included age when first started drinking, number of inpatient hospitalizations to reduce or quit drinking, number of outpatient visits to reduce or quit drinking, and number of alcohol-treatment group meetings attended over the past year. Finally, treatment goal was defined as one of the following: (a) total abstinence, (b) occasional drinking, (c) temporary abstinence, (d) controlled drinking, and (e) no stated goal. Other drinking variables included confidence in improving drinking from study participation (1–5 scale; 1 =
Cigarette smoking
Smoking variables included in the present study were smoking stratification from study random assignment, frequency of smoking (not at all, occasional, daily) assessed via Question 1 of the Fagerström Test for Nicotine Dependence (Heatherton et al., 1991), and past-week smoking frequency, including number of cigarettes smoked over the past week and average number of cigarettes smoked per day over the past week.
Quality of life
Quality of life was measured by baseline Mental and Physical Aggregate Scores on the Short Form-12 (Szabo, 1996).
Vital signs
The following vital signs were measured as part of the physical-eligibility screening: weight, sitting systolic blood pressure, sitting diastolic blood pressure, and sitting heart rate.
Medication compliance
Medication compliance was assessed through the number of capsules taken over the course of the trial. This was obtained and recorded according to face-to-face interviews and pill counts at study visits. Because this variable was measured after random assignment, we evaluated whether there were significant differences on compliance between participants randomly assigned to placebo compared with varenicline. We did not find a significant difference,
Statistical analysis
There are a variety of machine-learning methods that have been used for detecting interactions and some that have already been used specifically for identifying treatment responders. We selected one tree-based method and one regularization-based method because these are the two primary types of methods available. Tree-based methods all provide the advantage of allowing for complex interactions; however, traditional tree-based methods (e.g., decision trees, random forest) would explore the complete interaction space rather than specifically for interactions with treatment. There are some methods that explore interactions with specific variables (in our case, treatment arm). We considered both the virtual-twins method (Foster et al., 2011) and the simultaneous-threshold-interaction modeling algorithm (Dusseldorp et al., 2010), but both of these are global-outcome modeling methods (Lipkovich et al., 2017), meaning they prioritize accurate prediction of the outcome (in our case, drinking behavior) over accurate prediction of the treatment effect. We also considered interaction trees (Su et al., 2008, 2009), which focus on identifying variables that interact with a specified variable (e.g., treatment arm), which was closer to our goal. However, QUINT models achieve the same goal in addition to identifying qualitative interactions. These are interactions in which the direction of the treatment effect depends on the value of the moderator. The notable contrast would be that although interaction trees would potentially split participants because one group responds very well to varenicline and the other group responds only a little to varenicline, the treatment recommendation for those two groups does not vary. Instead, we used QUINT models because they focus on identifying splits that influence the treatment recommendation.
For potential-regularization methods, we considered lasso, ridge, and elastic net, and among these, elastic net tends to perform best. However, these models do not typically (by default) incorporate interactions. Although candidate interactions can be entered into these models, the model will not necessarily enforce “strong hierarchy.” Strong hierarchy is a property of a model such that if an interaction (e.g.,
QUINT
QUINT is a tree-based approach designed to detect qualitative interactions. Qualitative interactions occur when the effect of one variable (e.g., treatment) is reversed across different levels of a moderator (e.g., baseline characteristics; Dusseldorp et al., 2016; Dusseldorp & Van Mechelen, 2014). The method differs from traditional regression trees, which split to maximize prediction accuracy of the outcome by, instead, maximizing prediction accuracy of the treatment effect within a group. QUINTs achieve this goal by splitting groups according to a criterion that combines two characteristics: (a) the magnitude of the treatment effect on the outcome and (b) the size of groups after splitting. Splits aim to maximize both of these two components. After the first split, the treatment effect size is estimated in each of the two leaves, and if this effect size does not exceed a preset minimum, no tree is fit, and it is concluded that there is not sufficient evidence for any qualitative interaction (Dusseldorp et al., 2016). The final subgroups are described as “leaves,” each of which is classified into one of three classes: treatment responders (individuals who improved drinking outcomes when treated with varenicline), iatrogenic responders (individuals whose drinking outcomes were worse when treated with varenicline, i.e., favors placebo), and nonresponders. This method is particularly well suited to detecting groups of treatment responders and nonresponders because it prioritizes differentiating groups of people who differ not only in the size of their treatment effect but also the sign of their treatment effect. QUINT is appropriate for (multi)categorical and continuous predictors and continuous outcomes regardless of distribution. Previous research using QUINTs have examined which subgroups of people with HIV and depressive symptoms benefited from a cognitive-behavioral intervention (van Luenen et al., 2020), subgroups for whom different health interventions work (Formanoy et al., 2016), and differences in treatment effect between antidepressants and placebo in the treatment of major depression (Maruo et al., 2020).
The QUINT models were fit using the
Glinternet
Glinternet is a variation on lasso that is designed to appropriately capture interactions while maintaining strong hierarchy (Lim & Hastie, 2015). Strong hierarchy is the concept that if an interaction term is included in a linear model, so too should its lower-order coefficients. Basic lasso models with interactions do not necessarily follow strong hierarchy because they may select the interaction term while shrinking the lower-order coefficients to zero. Glinternet models use the group lasso to select sets of predictors (e.g., two lower-order coefficients and their interaction) to ensure this strong hierarchy; however, group lasso alone would not be sufficient because it should be possible for main effects to be selected into the model without the interaction term. Glinternet achieves this by also including the main effects as possible predictors, and because of the regularization, it is possible for both the main effects and the sets of coefficients (lower order and interactions) to be included in the model simultaneously. This method, unlike QUINTs, is still prioritizing accurate prediction of the outcome but has shown promise as a method for identifying groups of treatment responders (Formanoy et al., 2016; Lipkovich et al., 2017; Maruo et al., 2020; van Luenen et al., 2020) and performed better than stepwise selection methods for this purpose (Wester et al., 2022).
We fit these models using the
Results
QUINT results
Primary outcome: PHDD
A total of 199 randomly assigned participants were included in the analysis; one participant was excluded because of having a missing outcome. The analysis resulted in a pruned tree with four leaves (see Fig. 1), with each leaf representing a subgroup of patients. In three of the four leaves, treatment with varenicline was more effective at reducing PHDD compared with placebo, displayed in green in Figure 1. In the remaining leaf, treatment with placebo was more effective at reducing PHDD compared with varenicline, displayed in red in Figure 1. The tree branches indicate the characteristics (variables) of the patient subgroups. For the treatment-responders subgroups (i.e., the three leaves that showed a benefit of varenicline over placebo in reducing PHDD), patients who were older at study onset (> 49.5 years) and treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age and treated with placebo (Leaf 4). For younger patients (≤ 49.5 years), patients with a lower resting heart rate (≤ 63.5; Leaf 1) or with a higher resting heart rate (> 63.5) who smoked more than 10.5 cigarettes per day on average over the past week (Leaf 3) and were treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age, health, and smoking profile treated with placebo. Finally, younger patients with a higher resting heart rate who smoked fewer cigarettes per day on average and were treated with placebo reduced PHDD compared with patients of a similar profile treated with varenicline (Leaf 2). The

Qualitative interaction tree (QUINT) for percent heavy drinking day. Results of the QUINT analysis for reduction in percent heavy drinking days. The vertical axis indicates the difference in means between the treatments. Each leaf in the tree is assigned to a subgroup (P1, P2, or P3). In the green leaves (Leaves 1, 3, and 4, labeled P1), treatment with varenicline was more beneficial than treatment with placebo, whereas for the red leaf (Leaf 2, labeled P2), treatment with placebo was more beneficial than treatment with varenicline. Average number of cigarettes per day variable was measured based on past week estimates from the Smoking Quantity and Frequency Questionnaire.
Qualitative Interaction Tree PHDD Results
Note: PHDD = percent heavy drinking days;
Glinternet results
Primary outcome: PHDD
The cross-validation error for the model was 0.07, indicating that in the cross-validation process, the average squared error for out-of-sample prediction was 0.07 (or about 26% change for PHDD). The glinternet model for proportion change in PHDD highlighted several features that interacted with treatment arm to define treatment responders, including participant age, medication dose, craving, race, age of first drink, drinking goal after treatment, several AUD diagnostic criteria, and number of cigarettes smoked in the past week. The interaction of participant age and treatment arm was such that the effect of varenicline was strongest among older individuals, but across all observed ages, varenicline reduced PHDD more than placebo. Medication dose interacted with treatment arm such that individuals in the varenicline group had decreased PHDD compared with placebo overall, whereas individuals in the placebo group who took more doses of placebo had worse proportion change in PHDD. Race, summarized as White (

Group least absolute shrinkage and selection operator interaction net (glinternet) plots for percent heavy drinking days (PHDD). Features from the glinternet model for predicted percent change in PHDD with quantitative interactions are displayed. The predicted percent change in PHDD is on the vertical axis. On the horizontal axis are the categories for each predictor. Abst = abstinence; temp abst = temporary abstinence; alcohol taken in larger amounts or over longer periods of time was measured using Item 12c (“During the times when you drank alcohol, did you end up drinking more than you planned when you started?”) from the MINI Alcohol Dependence/Abuse Module. (a) Drinking goal interacted with treatment such that individuals who reported a harm-reduction goal or did not have a goal had better outcomes when treated with varenicline; for individuals who reported an abstinence or temporary abstinence goal, there was no difference in outcome when treated with varenicline versus placebo. (b) Race interacted with treatment arm such that individuals who identified as White and more than one race/other race had a beneficial response to varenicline in reducing PHDD; for individuals who identified as Black, there was no difference in PHDD between participants treated with varenicline versus placebo. (c) Endorsement of the alcohol use disorder symptom of drinking more than planned interacted with treatment arm such that participants who did not endorse this criterion had fewer PHDD when treated with varenicline compared with placebo.
Craving, as measured by the PACS, interacted with treatment arm such that the effect of varenicline was largest among participants with low craving scores, but across all observed craving levels, varenicline reduced PHDD more than placebo. Age of first drink interacted with treatment arm such that the beneficial effect of varenicline was greatest among individuals who had a younger age of first drink. Drinking goal, summarized as abstinence, temporary abstinence, controlled drinking, occasional drinking, and no goal, interacted with treatment arm such that individuals who reported a harm-reduction goal or did not have a goal had better PHDD outcomes when treated with varenicline; for individuals who reported an abstinence or temporary-abstinence goal, there was no difference in PHDD when treated with varenicline versus placebo (see Fig. 2). Finally, several AUD diagnostic criteria interacted with treatment arm. The first criterion was drinking more than planned, which interacted with treatment arm such that participants who did not endorse this criterion had reduced PHDD when treated with varenicline compared with placebo (see Fig. 2). The second criterion was spending less time on other activities because of drinking, which interacted with treatment arm such that participants who did not endorse this criterion had reduced PHDD when treated with varenicline compared with placebo. The third criterion was drinking despite negative consequences, which interacted with treatment arm such that participants who endorsed this criterion had fewer PHDD when treated with varenicline compared with placebo. Finally, the number of cigarettes smoked in the past week interacted with treatment arm such that as the number of cigarettes smoked increased, the benefit of varenicline over placebo on PHDD also increased (i.e., there was little benefit of varenicline when the number of cigarettes smoked was 0 and a large benefit of varenicline over placebo when the number of cigarettes smoked was 25). For plots of the qualitative interactions, see Figure 2. The results of the glinternet models for the secondary outcomes are available in the Supplemental Material.
For the multiple-imputation analysis of the glinternet model, Table S2 in the Supplemental Material provides a summary of the proportion of times each interaction with treatment arm was selected. Note that age and drinking goal were selected in 100% of the imputed data sets. Race and heart rate were selected in 98% and 97% of the imputed data sets, respectively. Cravings was selected in 49% of the imputed data sets, and no other variables were selected in more than 20% of the imputed data sets. Cross-validation error was similar in the imputed data sets to the listwise-deleted data set (range = 0.26–0.28,
Discussion
This secondary analysis of a large, multisite clinical trial of varenicline for AUD sought to identify treatment responders using two distinct machine-learning methods: QUINT and glinternet. Both approaches found that older individuals and participants who smoked more as potential responders to varenicline had larger reductions in PHDD. This key finding is consistent with theory-based moderator analysis suggesting that smokers may respond better to varenicline for the treatment of AUD (Falk et al., 2015) and positive varenicline trials for drinking outcomes conducted in smoking samples (Fucito et al., 2011; Mitchell et al., 2012). The alignment between data-driven and theory-driven findings is key because the benefits of machine-learning models are predicated on their ability to generate generalizable results.
A more in-depth interpretation of the results from the QUINT models for the primary outcome (PHDD) showed that patients who were older at study onset (> 49.5 years) and treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age and treated with placebo. For younger patients (≤ 49.5 years), patients with a lower resting heart rate (≤ 63.5) or with a higher resting heart rate (> 63.5) who smoked more than 10.5 cigarettes per day on average over the past week and were treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age, health, and smoking profile treated with placebo. This finding suggests that not only smoking status and age but also a measure of cardiovascular health (resting heart rate) was necessary to identify treatment responders. The benefits of varenicline for these groups are clinically significant such that they resulted in an expected number of heavy drinking days ranging from eight to 13, which is a significant reduction from the average baseline of 26 heavy drinking days. Furthermore, younger patients with a higher resting heart rate who smoked fewer cigarettes per day, on average, experienced an iatrogenic effect of varenicline treatment. This was evidenced by higher PHDD in individuals treated with varenicline compared with placebo-treated individuals with similar profiles.
A particularly useful outcome of the QUINT models is the estimated standardized effect size in each leaf. In particular, this information could be used to make projections about future RCTs that target likely responding populations. For example, the original study found an effect size of
Results of the glinternet model for the primary outcome showed a number of nuanced effects. Note that alcohol craving, as measured by the PACS, interacted with treatment arm such that the effect of varenicline was largest among participants with low craving scores, yet across all observed craving levels, varenicline reduced PHDD more than placebo. This is relevant given that a recent study specifically probing the craving-reducing properties of varenicline versus placebo reported null effects for cue-induced craving (Miranda et al., 2020). In another nuanced finding, drinking goal interacted with treatment arm such that individuals who reported a harm-reduction goal or did not have a goal had better PHDD outcomes when treated with varenicline; for individuals who reported an abstinence or temporary-abstinence goal, there was no difference in PHDD when treated with varenicline compared with placebo. We conclude that varenicline may be especially useful for individuals with a controlled drinking goal, presenting an opportunity for personalized medicine. On a related note, not endorsing specific AUD diagnostic criteria (i.e., having fewer specific AUD symptoms) predicted better clinical response to varenicline, which is consistent with a recent study in which lower AUD severity was associated with better clinical response (Donato et al., 2021). Finally, the number of cigarettes smoked in the past week interacted with treatment arm such that as the number of cigarettes smoked increased, the benefit of varenicline over placebo on PHDD also increased (i.e., there was little benefit of varenicline when the number of cigarettes smoked was 0 and a large benefit of varenicline over placebo when the number of cigarettes smoked was 25). This finding underscores the relationship between smoking and drinking outcomes for varenicline treatment. A recent trial by our laboratory found that among heavy-drinking smokers, varenicline alone produced higher smoking-cessation rates than the combination of varenicline plus naltrexone (Ray et al., 2021).
Although it is important to highlight the moderators identified using the machine-learning approaches, it is also of interest to note consistency in variables that were not identified as treatment moderators using these approaches. Both the QUINT and glinternet models did not identify several variables as modifiers, including alcohol-related consequences from the ImBIBe questionnaire, family history of AUD, alcohol withdrawal, and sex. These results are consistent with the findings of Falk and colleagues (2015), who used a theory-based approach to moderator testing and also did not identify these variables as significant moderators of treatment response to varenicline. This study did not identify sex as a potential treatment modifier, which is of particular interest given that some studies have found sex differences in drinking outcomes in individuals with an AUD with comorbid smoking (Bold et al., 2019; O’Malley et al., 2018). Specifically, men treated with varenicline had better drinking outcomes during the trial (O’Malley et al., 2018) and after treatment (Bold et al., 2019) relative to placebo. The inconsistent findings related to sex as a treatment moderator may be due to study design. The study by O’Malley and colleagues (2018) specifically recruited individuals with AUD who were comorbid smokers, whereas the multisite study of varenicline recruited individuals with an AUD regardless of smoking status (Litten et al., 2013). The O’Malley et al. study also had a longer treatment duration and a more intensive behavioral intervention combined with pharmacotherapy treatment, and participants had more days abstinent and fewer heavy drinking days at baseline compared with the multisite trial. These differences in study design and participant characteristics may have resulted in the inconsistent findings related to sex as a treatment moderator. Both studies had substantially more male participants (≈70% male in both studies) than female participants; future studies should enroll equal numbers of males and females blocked with smoking status to fully explore sex differences to treatment with varenicline.
From a methodological viewpoint, in this study, we used two machine-learning approaches that have been suggested for accurately identifying treatment responders (Lipkovich et al., 2017). By using two methods, we were able to compare results from each; however, future studies may select one of these methods rather than using both. Each method has relative pros and cons that should be considered when selecting a method in the future. Previous simulation research suggests that both QUINT and glinternet models can perform well with reasonably small samples (e.g.,
Although we recommend using QUINT models for future work on identifying treatment responders, it is important not to overinterpret the results of these models. In particular, the cut points used to create the trees (e.g., age > 49.5 years, heart rate > 63.5) are determined to optimize prediction; however, these points should not be interpreted as qualitative change points or piecewise transitions. It is possible for the response to varenicline to continuously increase or decrease with some continuous moderator, but still the QUINT models will always provide cutoffs within those continua. In addition, the cutoffs are specific to the current data and, just like any other statistic, are expected to have some sampling variability. Note that the age cutoff used in the PHDD model (49.5 years) was different than the cutoff in the DPD model (54.5 years); likely this is not due to some systematic difference between the two outcomes but, rather, some amount of sampling variability. Future research with larger and representative samples could be used to generate more reliable cutoffs. New methods that can provide some uncertainty around cutoff criteria would be beneficial in determining diagnostic decision rules. Although we believe the QUINT models likely do a good job of identifying an optimized cutoff given the data at hand, they do not provide an uncertainty estimate for that cutoff and so should be interpreted with caution. Thus, we have focused our interpretations on general directionality (e.g., older, younger) rather than the specific cutoff selected by the QUINT models.
In closing, in this study, we tested two machine-learning approaches to elucidating clinical response to varenicline versus placebo for the treatment of AUD. Results were generally consistent with theory-driven models and the extant literature by highlighting the role of smoking status (smokers responded better to varenicline), severity of AUD (lower severity associated with better response to varenicline), medication adherence, and drinking goal (individuals with controlled-drinking goal responded better to varenicline). However, other variables, such as sex, were not selected as moderators in these analyses. This study is the first to demonstrate the shared contributions of all of these variables simultaneously and in competition with a broad set of other potential moderators. This study also produced novel findings, including the interaction between age and cardiovascular health in predicting clinical response. This is noteworthy given that smoking and alcohol use have detrimental effects on cardiovascular-health outcomes (Ebbert et al., 2005). Participants with lower alcohol-craving levels responded better to varenicline versus placebo. This is relevant given a recent study that showed varenicline did not blunt cue-induced craving in individuals with AUD compared with placebo (Miranda et al., 2020). This study confirms established predictors of response to varenicline for AUD while also presenting new directions that are clinically useful. This study adds to existing literature suggesting that varenicline may be particularly useful for individuals who drink alcohol heavily and also smoke cigarettes. A novel contribution is the suggestion that varenicline may be recommended for individuals with a controlled-drinking goal and individuals with lower levels of subjective alcohol craving. Together, these findings have high potential for clinical dissemination because they elucidate key variables thought to identify responders and individuals for whom varenicline treatment may be iatrogenic. As the field continues to integrate machine-learning methods in personalized medicine, studies that can provide a nuanced methodological examination of machine-learning methods and effectively integrate them with the medication-development literature have the highest potential to inform clinical practice.
Supplemental Material
sj-doc-1-cpx-10.1177_21677026231169922 – Supplemental material for Identifying Treatment Responders to Varenicline for Alcohol Use Disorder Using Two Machine-Learning Approaches
Supplemental material, sj-doc-1-cpx-10.1177_21677026231169922 for Identifying Treatment Responders to Varenicline for Alcohol Use Disorder Using Two Machine-Learning Approaches by Erica N. Grodin, Amanda K. Montoya, Alondra Cruz, Suzanna Donato, Wave-Ananda Baskerville and Lara A. Ray in Clinical Psychological Science
Footnotes
Transparency
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
