Sage Journals: Discover world-class research

Abstract

Varenicline has shown promise for treating alcohol use disorder (AUD); however, not everyone will respond to varenicline. Machine-learning methods are well suited to identify treatment responders. In the present study, we examined data from the National Institute on Alcohol Abuse and Alcoholism Clinical Intervention Group multisite clinical trial of varenicline using two machine-learning methods. Baseline characteristics taken from a randomized clinical trial of varenicline were examined as potential moderators of treatment response using qualitative interaction trees (N = 199) and group least absolute shrinkage and selection operator interaction nets (N = 200). Results align with prior research, highlighting smoking status, AUD severity, medication adherence, and drinking goal as predictors of treatment response. Novel findings included the interaction between age and cardiovascular health in predicting clinical response and stronger medication effects among individuals with lower craving. With increased integration of machine-learning methods, studies that effectively integrate methods and medication development have high potential to inform clinical practice.

Keywords

varenicline alcohol use disorder machine learning QUINT glinternet treatment responder personalized medicine

Varenicline is partial agonist of the α4β2 nicotinic acetylcholine receptor. It is approved by the Food and Drug Administration for smoking cessation but has shown promise for the treatment of alcohol use disorder (AUD) because nicotine acetylcholine receptors are implicated in both alcohol and nicotine reward (Lê et al., 2000). Results supporting varencline’s efficacy as an AUD treatment have been mixed. Two preliminary randomized controlled trials (RCTs) found varenicline to be effective at reducing drinking and alcohol craving (Fucito et al., 2011; Mitchell et al., 2012) in individuals who smoke and drink heavily. These results were confirmed by a large multisite RCT, which found reductions in heavy drinking days, drinks per drinking day, and alcohol craving in individuals with an AUD (Litten et al., 2013). However, several RCTs have found no effect of varenicline on drinking outcomes (de Bejczy et al., 2015; Plebani et al., 2013), and a multisite human laboratory investigation found that varenicline did not reduce alcohol cue-induced craving (Miranda et al., 2020). A recent study by our group found that varenicline plus naltrexone was superior to varenicline plus placebo for reducing drinks per drinking day in a sample of heavy-drinking smokers who were treatment seeking for smoking cessation and drinking reduction (Ray et al., 2021).

Although varenicline shows promise, we note that not all individuals will respond to varenicline, suggesting the potential for treatment moderators, much as is the case for other AUD pharmacotherapies. Treatment moderators are variables for which the effect of treatment may vary across the values of the moderator. Note that moderators are not necessarily causing a difference on treatment effect (i.e., intervening on moderators may not change the treatment effect) but are descriptive of populations across which treatment effect may vary. To that end, identifying treatment responders to varenicline represents a promising research area. Falk et al. (2015) conducted an analysis examining 17 potential moderators of varenicline’s efficacy and found four significant moderators: reductions in cigarette smoking, treatment drinking goal, years of regular drinking, and patient age. More recently, using the same multisite trial, Donato et al. (2021) found that in individuals with low-severity AUD, varenicline reduced drinking and improved quality of life relative to placebo, whereas high-severity individuals did not benefit from treatment with varenicline. In sum, moderator analyses to date have suggested that AUD severity, as a multidimensional factor (Donato et al., 2021) or as individual indicators (Falk et al., 2015), may help identify treatment responders to varenicline. However, both of these previous studies used traditional statistical methods such as linear regression and linear mixed models with null hypothesis testing. These methods are limited because they frequently cannot identify complex patterns of moderation that involve multiple variables or allow for many candidate moderators.

An alternate approach to identifying treatment responders relies more heavily on data-driven methods than a priori moderators. Specifically, machine-learning methods have become increasingly popular and are well suited for efforts to identify treatment responders using data-driven tools. One benefit of machine-learning approaches is their focus on prediction accuracy over traditional statistical significance indicated by a p value (Bzdok & Meyer-Lindenberg, 2018). Other benefits of machine-learning approaches include their ability to handle multiple outcomes at once and suitability to observational data (Bzdok & Meyer-Lindenberg, 2018). Machine-learning methods can accommodate more predictors in the models without major loss of predictive power, particularly those methods that use regularizations (Friedman et al., 2008). This is a particular advantage over traditional statistical methods because real-world interactions may involve multiple variables, and many candidate variables may be identified as potential moderators.

In the field of AUD-treatment development, machine-learning approaches were used in a recent reanalysis of the multisite trial of Gabapentin (Laska et al., 2020). Specifically, this study used a random forest based on prerandomization (i.e., baseline) variables to identify likely responders. The primary report from the clinical trial found no significant treatment effect (Falk et al., 2019). Results from the machine-learning analysis revealed that likely responders had a higher number of heavy-drinking days, lower levels of anxiety and mood symptoms, and higher cognitive and motor impulsivity scores at baseline relative to unlikely responders (Laska et al., 2020). Thus, likely responders to gabapentin may demonstrate higher levels of externalizing symptoms and lower levels of internalizing symptoms compared with unlikely responders. Likewise, Gueorguieva et al. (2015) used tree-based approaches to assess treatment moderators in the COMBINE study and identified individuals with shorter abstinence periods who are not overweight or obese as likely responders to acamprosate.

The application of machine learning to predicting treatment response is in its infancy, and there is no current consensus on best methods (Lipkovich et al., 2017). Regression trees and random forests are excellent at building models with complex interactions, which are helpful for identifying treatment responders because the hypothesis involves an interaction between treatment and other baseline variables. These methods iteratively split participants according to candidate variables to maximize homogeneity in the outcome within groups, maximizing predictive accuracy. However, tree-based approaches can produce very complex models that are difficult to interpret.

An alternative approach involves a regularized linear model, such as least absolute shrinkage and selection operator (lasso) and related methods, including group lasso and elastic net. These models use a penalty such that regression coefficients are shrunk toward zero, resulting in only a subset of predictors included in the model. These models align well with clinical researchers’ understanding of linear regression and allow for more interpretable models. However, they are limited to linear models and require specification of interactions. Applications of the lasso in detecting treatment responders typically allow for only two- or three-way interactions (Ballarini et al., 2018), whereas random forest easily identify nonlinear relationships and complex interactions of any form. Both regression-tree methods and regularization methods show promise for being able to predict treatment response; however, these methods have complementary benefits and limitations. On the basis of the novelty of machine-learning approaches and the pros and cons of various methods, we explore using one of each method (i.e., tree-based methods vs. regularization methods) and compare their relative performance. Although future simulation research must compare the performance of these methods, researchers may also select a method based on practical concerns, such as software implementation, types of results output, and methods for handling missing data. Although this study cannot provide recommendations for which methods will perform best at prediction in applications beyond the current one, the results of this study can speak to these practical concerns.

Toward identifying responders to varenicline and improving the application of machine learning to the prediction of clinical response in AUD, in the present study, we examined data from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) Clinical Intervention Group (NCIG) trial of varenicline using two machine-learning methods. These data-driven approaches differ from the two previous studies of a priori predictors (Donato et al., 2021; Falk et al., 2015) and have the benefit of prioritizing out-of-sample prediction and allowing for data-driven discovery. This study aims to identify treatment responders using a host of clinical and demographic baseline variables and to compare two machine-learning methods. Specifically, we compared (a) the qualitative interaction tree (QUINT; Dusseldorp & Van Mechelen, 2014), which is a specific type of regression tree, and (b) the group lasso interaction net (glinternet; Lim & Hastie, 2015), which is a specific type of group lasso. This comparison seeks to elucidate best practices for AUD clinical-trial treatment-responder investigations.

Transparency and Openness

The studies or analyses reported in this article were not preregistered. Data for this study are available through NIAAA’s NCIG. A material transfer agreement is needed to obtain the data. Furthermore, the materials, code, and other resources developed as part of the research reported for this study can be obtained through reasonable request to the corresponding author. This study involved an analysis of existing data rather than new data collection and did not require additional ethical approval.

Method

Study overview

Data for this secondary analysis were drawn from a 13-week, multisite, Phase 2, double-blind, placebo-controlled, parallel group trial of varenicline (NCT01146 613; Litten et al., 2013). This data set was publicly available to qualified investigators and was obtained by L. A. Ray through a material transfer agreement with the NCIG/NIAAA. Participants were randomly assigned to varenicline or placebo on a 1:1 ratio; random assignment was stratified by trial site and regular smoking, defined as 10 or more or fewer than 10 cigarettes smoked per day for the past week. Varenicline dosage was titrated as follows: A starting dose of 0.5 mg was taken once a day on Days 1 through 3; next, a dose of 0.5 mg was taken twice a day on Days 4 through 7; and the target dose of 1 mg taken twice daily was maintained from Weeks 2 through 13 (for dosage methods, see Litten et al., 2013).

Study population

Randomly assigned study participants were 200 individuals (142 males, 58 females) diagnosed with past-year alcohol dependence according to the DSM-IV-TR (American Psychiatric Association, 2000). Inclusion criteria were age of at least 18 years, reported drinking at least 28 drinks per week for females and 35 drinks per week for males for the 28 days before consent and for the 7 days before random assignment, and a breath alcohol content of 0.000 g/dl for study consent. Exclusion criteria were past-year substance dependence on any substance other than alcohol or nicotine, having undergone medical detoxification during study screening, previous treatment with varenicline, and past-year suicidality risk or a history of ever attempting suicide (for complete inclusion and exclusion criteria and participant demographics, see Litten et al., 2013).

Measures

Primary and secondary study endpoints

In the current analysis, we used the primary and secondary drinking endpoints from the original RCT. The primary efficacy outcome for the trial was percent heavy drinking days (PHDD), defined as 4 or more drinks per day for females and 5 or more drinks per day for males, during the maintenance phase of the trial (Weeks 2–13). Secondary drinking outcomes examined in the current study were drinks per day (DPD), drinks per drinking day (DPDD), and percent very heavy drinking days (PVHDD; 8+/10+ drinks per drinking day for females and males, respectively) during the maintenance phase of the trial. All drinking measures were captured through the Timeline Followback and Form 90 interview methodology and procedures (Miller, 1996; Sobell & Sobell, 1992).

Baseline measures

Candidate predictors were selected according to theoretical and practical concerns. A list of 382 variables were collected from the original trial data. Of these, 240 variables were excluded because of limited theory suggesting their candidacy as potential moderators (e.g., usual occupation). An additional 77 variables were excluded for being collected after baseline. Finally, 29 variables were excluded for having limited variance or low cell frequency (e.g., ethnicity). This resulted in 36 candidate predictors, described below. Summary statistics for all candidate predictors are included in the Supplemental Material available online.

Demographics

Baseline demographic variables were assessed during the initial screening visit. Variables included in the present analysis were sex (male or female), age, race (White, Black or African American, more than one race, or other race), marital status (divorced, legally married, living with partner or cohabiting, never married, separated, widowed), formal education (high school, more than high school), and past 30-day employment (full-time 35+ hours per week, part-time regular hours, part-time irregular hours or day work, military service, homemaker, retired, disabled, student, unemployed).

Alcohol-use variables

The following variables assessed alcohol use severity and history: DSM-IV-TR (American Psychiatric Association, 2000) alcohol dependence or abuse diagnoses, DSM-IV-TR (American Psychiatric Association, 2000) alcohol-dependence symptoms (seven symptoms with presence or absence) assessed via the MINI International Neuropsychiatric Interview (MINI; Sheehan et al., 1998), family history of AUD (alcohol problems, no alcohol problems, unknown) assessed via the Family Tree Questionnaire (Mann et al., 1985), alcohol-withdrawal score assessed through the Clinical Institute Withdrawal Assessment of Alcohol (Sullivan et al., 1989), alcohol-craving score measured through the Penn Alcohol Craving Scale (PACS; Flannery et al., 1999), and negative drinking-related consequences score assessed via the ImBIBe, which is an adaptation of the Drinker Inventory of Consequences (Miller, 1995; Werner et al., 2008).

Drinking- and treatment-history variables were collected as part of the medical-history exam and included age when first started drinking, number of inpatient hospitalizations to reduce or quit drinking, number of outpatient visits to reduce or quit drinking, and number of alcohol-treatment group meetings attended over the past year. Finally, treatment goal was defined as one of the following: (a) total abstinence, (b) occasional drinking, (c) temporary abstinence, (d) controlled drinking, and (e) no stated goal. Other drinking variables included confidence in improving drinking from study participation (1–5 scale; 1 = not confident, 5 = extremely confident) and preferred alcoholic beverage (beer, wine, straight liquor, cocktail, other).

Cigarette smoking

Smoking variables included in the present study were smoking stratification from study random assignment, frequency of smoking (not at all, occasional, daily) assessed via Question 1 of the Fagerström Test for Nicotine Dependence (Heatherton et al., 1991), and past-week smoking frequency, including number of cigarettes smoked over the past week and average number of cigarettes smoked per day over the past week.

Quality of life

Quality of life was measured by baseline Mental and Physical Aggregate Scores on the Short Form-12 (Szabo, 1996).

Vital signs

The following vital signs were measured as part of the physical-eligibility screening: weight, sitting systolic blood pressure, sitting diastolic blood pressure, and sitting heart rate.

Medication compliance

Medication compliance was assessed through the number of capsules taken over the course of the trial. This was obtained and recorded according to face-to-face interviews and pill counts at study visits. Because this variable was measured after random assignment, we evaluated whether there were significant differences on compliance between participants randomly assigned to placebo compared with varenicline. We did not find a significant difference, t(198) = 0.60, p = .55, suggesting that treatment arm did not affect compliance, allowing us to use it as a candidate moderator of the treatment effect.

Statistical analysis

There are a variety of machine-learning methods that have been used for detecting interactions and some that have already been used specifically for identifying treatment responders. We selected one tree-based method and one regularization-based method because these are the two primary types of methods available. Tree-based methods all provide the advantage of allowing for complex interactions; however, traditional tree-based methods (e.g., decision trees, random forest) would explore the complete interaction space rather than specifically for interactions with treatment. There are some methods that explore interactions with specific variables (in our case, treatment arm). We considered both the virtual-twins method (Foster et al., 2011) and the simultaneous-threshold-interaction modeling algorithm (Dusseldorp et al., 2010), but both of these are global-outcome modeling methods (Lipkovich et al., 2017), meaning they prioritize accurate prediction of the outcome (in our case, drinking behavior) over accurate prediction of the treatment effect. We also considered interaction trees (Su et al., 2008, 2009), which focus on identifying variables that interact with a specified variable (e.g., treatment arm), which was closer to our goal. However, QUINT models achieve the same goal in addition to identifying qualitative interactions. These are interactions in which the direction of the treatment effect depends on the value of the moderator. The notable contrast would be that although interaction trees would potentially split participants because one group responds very well to varenicline and the other group responds only a little to varenicline, the treatment recommendation for those two groups does not vary. Instead, we used QUINT models because they focus on identifying splits that influence the treatment recommendation.

For potential-regularization methods, we considered lasso, ridge, and elastic net, and among these, elastic net tends to perform best. However, these models do not typically (by default) incorporate interactions. Although candidate interactions can be entered into these models, the model will not necessarily enforce “strong hierarchy.” Strong hierarchy is a property of a model such that if an interaction (e.g., XZ) is selected into the model, so too should its lower-order predictors (X and Z). This can be addressed by using the group lasso (Bakin, 1999) by treating the interaction and lower-order predictors (XZ, X, Z) as a group: meaning that if any variable is selected into the model, then all are selected into the model. However, this can result in models with limited prediction accuracy because only variables with strong interactions will be selected into the model and any variables that have main effects but not interactions are not selected into the model. The glinternet method allows for both the lower-order predictors or the lower-order predictors and the interactions to be selected into the model. For this reason, we selected the glinternet method to be used in this article.

QUINT

QUINT is a tree-based approach designed to detect qualitative interactions. Qualitative interactions occur when the effect of one variable (e.g., treatment) is reversed across different levels of a moderator (e.g., baseline characteristics; Dusseldorp et al., 2016; Dusseldorp & Van Mechelen, 2014). The method differs from traditional regression trees, which split to maximize prediction accuracy of the outcome by, instead, maximizing prediction accuracy of the treatment effect within a group. QUINTs achieve this goal by splitting groups according to a criterion that combines two characteristics: (a) the magnitude of the treatment effect on the outcome and (b) the size of groups after splitting. Splits aim to maximize both of these two components. After the first split, the treatment effect size is estimated in each of the two leaves, and if this effect size does not exceed a preset minimum, no tree is fit, and it is concluded that there is not sufficient evidence for any qualitative interaction (Dusseldorp et al., 2016). The final subgroups are described as “leaves,” each of which is classified into one of three classes: treatment responders (individuals who improved drinking outcomes when treated with varenicline), iatrogenic responders (individuals whose drinking outcomes were worse when treated with varenicline, i.e., favors placebo), and nonresponders. This method is particularly well suited to detecting groups of treatment responders and nonresponders because it prioritizes differentiating groups of people who differ not only in the size of their treatment effect but also the sign of their treatment effect. QUINT is appropriate for (multi)categorical and continuous predictors and continuous outcomes regardless of distribution. Previous research using QUINTs have examined which subgroups of people with HIV and depressive symptoms benefited from a cognitive-behavioral intervention (van Luenen et al., 2020), subgroups for whom different health interventions work (Formanoy et al., 2016), and differences in treatment effect between antidepressants and placebo in the treatment of major depression (Maruo et al., 2020).

The QUINT models were fit using the quint package in R (Dusseldorp et al., 2016). We included 36 potential moderators (see Table S1 in the Supplemental Material), treatment arm (varenicline or placebo) as the treatment variable, and proportion change in drinking endpoint (primary outcome: PHDD; secondary outcomes: DPD, DPDD, and PVHDD) as the outcome. The magnitude of treatment component of the splitting criteria can use either the mean difference in the outcome or a Cohen’s d. We selected the mean difference so that differences in variance throughout the groups would not affect the selection process. For the initial split, QUINT models require a specification for how large an effect would be considered a qualitative difference; otherwise, if such a difference is not present, the tree will not be fit. We selected an effect size of 0.4 because of prior simulation research that suggests this threshold is appropriate for managing Type I error rate at sample sizes of approximately 200 (Dusseldorp & Van Mechelen, 2014). We used 200 bootstrap samples and specified that a minimum of five participants from each group must be in a leaf. Initial trees allowed up to seven leaves and then pruned on the basis of the bootstrap samples; this pruning process avoids overfitting. Follow-up trees were limited to four leaves to improve clinical interpretability; we report these results below. To account for missing values on predictor variables, we used the missingness incorporated in attributes method proposed by Twala et al. (2008). For any continuous predictor that had missing values (eight), two version of the variables were created, one in which missing values were imputed as a value lower than the minimum and one in which missing values were imputed as a value higher than the maximum. Both variables were entered into the QUINT models. For any categorical predictor with missing values (four), missing values were entered in as a separate factor level. Participants missing on the outcome (n = 1) were excluded.

Glinternet

Glinternet is a variation on lasso that is designed to appropriately capture interactions while maintaining strong hierarchy (Lim & Hastie, 2015). Strong hierarchy is the concept that if an interaction term is included in a linear model, so too should its lower-order coefficients. Basic lasso models with interactions do not necessarily follow strong hierarchy because they may select the interaction term while shrinking the lower-order coefficients to zero. Glinternet models use the group lasso to select sets of predictors (e.g., two lower-order coefficients and their interaction) to ensure this strong hierarchy; however, group lasso alone would not be sufficient because it should be possible for main effects to be selected into the model without the interaction term. Glinternet achieves this by also including the main effects as possible predictors, and because of the regularization, it is possible for both the main effects and the sets of coefficients (lower order and interactions) to be included in the model simultaneously. This method, unlike QUINTs, is still prioritizing accurate prediction of the outcome but has shown promise as a method for identifying groups of treatment responders (Formanoy et al., 2016; Lipkovich et al., 2017; Maruo et al., 2020; van Luenen et al., 2020) and performed better than stepwise selection methods for this purpose (Wester et al., 2022).

We fit these models using the glinternet package in R (Lim & Hastie, 2015). The model included the same 36 predictors as the QUINT model and treatment arm, and the outcome variable was proportion change in drinking behavior (PHDD, DPD, DPDD, PVHDD). All two-way interactions between the 36 predictors and treatment arm were entered into the model. Tenfold cross-validation was used to select the tuning parameter for the model, and the value of the tuning parameter that minimized mean square error for out-of-sample prediction was selected. Out-of-sample prediction, which is the ability of the model to accurately predict cases that were not included in the cross-validation sample, is the criteria used for machine-learning methods because these methods are designed to avoid overfitting to the sample and optimize prediction of cases not included in the sample. This process is approximated by using cross-validation that splits up the sample into parts (e.g., 10 parts), fits the model with all parts but one (e.g., nine parts), and uses that model to predict the omitted part. This process is repeated 10 times, leaving out each part once, and the average mean square error for predicting the omitted part is calculated. This entire process is repeated many times using a different value for the tuning parameter, and then the tuning parameter with the lowest mean square error is selected. On the basis of the model with the selected tuning parameter, we examined which coefficients were nonzero, particularly which interactions with treatment arm were included to determine factors that may define treatment responders. Plots were generated to visualize these interactions and determine directionality of effects. This process was repeated for all four outcome variables. To accommodate missing data, we followed the approach recommended by Gunn et al. (2022) for multiple imputation with lasso. We generated 100 imputed data sets using fully conditional specification in BLIMP (Enders et al., 2019) and analyzed each data set with the process described above. We recorded the proportion of data sets for which each interaction with treatment arm was selected and compare these results with the analysis with listwise deletion.

Results

QUINT results

Primary outcome: PHDD

A total of 199 randomly assigned participants were included in the analysis; one participant was excluded because of having a missing outcome. The analysis resulted in a pruned tree with four leaves (see Fig. 1), with each leaf representing a subgroup of patients. In three of the four leaves, treatment with varenicline was more effective at reducing PHDD compared with placebo, displayed in green in Figure 1. In the remaining leaf, treatment with placebo was more effective at reducing PHDD compared with varenicline, displayed in red in Figure 1. The tree branches indicate the characteristics (variables) of the patient subgroups. For the treatment-responders subgroups (i.e., the three leaves that showed a benefit of varenicline over placebo in reducing PHDD), patients who were older at study onset (> 49.5 years) and treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age and treated with placebo (Leaf 4). For younger patients (≤ 49.5 years), patients with a lower resting heart rate (≤ 63.5; Leaf 1) or with a higher resting heart rate (> 63.5) who smoked more than 10.5 cigarettes per day on average over the past week (Leaf 3) and were treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age, health, and smoking profile treated with placebo. Finally, younger patients with a higher resting heart rate who smoked fewer cigarettes per day on average and were treated with placebo reduced PHDD compared with patients of a similar profile treated with varenicline (Leaf 2). The y-axis on the leaf plots is the estimated mean difference in proportion reduction in PHDD. For example, among patients who are younger than 49.5 years and had a resting heart rate less than 63.5, the expected proportion reduction for varenicline compared with placebo is 0.31. This means that people in this group are expected to decrease their heavy drinking 31% more when assigned to varenicline compared with placebo. The number of participants assigned each medication condition is listed in the bottom left corner of each leaf. Exact summary statistics for each leaf are available in Table 1. Perhaps a useful translation of these proportion reductions is to consider the average baseline PHDD at the beginning of the study, which was .88, meaning that on average, participants drank heavily 26 out of 30 days at baseline. Considering participants in Leaf 1, the reduction in PHDD in the placebo condition translates to an expected 19 days of heavy drinking out of 30 at the end of the study, whereas in the varenicline condition, an average participant is expected to have 11 heavy drinking days out of 30. For a similar translation in the other leaves, see Table 1. For results of the QUINT models for the secondary drinking outcomes, see the Supplemental Material.

Fig. 1.

Qualitative interaction tree (QUINT) for percent heavy drinking day. Results of the QUINT analysis for reduction in percent heavy drinking days. The vertical axis indicates the difference in means between the treatments. Each leaf in the tree is assigned to a subgroup (P1, P2, or P3). In the green leaves (Leaves 1, 3, and 4, labeled P1), treatment with varenicline was more beneficial than treatment with placebo, whereas for the red leaf (Leaf 2, labeled P2), treatment with placebo was more beneficial than treatment with varenicline. Average number of cigarettes per day variable was measured based on past week estimates from the Smoking Quantity and Frequency Questionnaire.

Table 1.

Qualitative Interaction Tree PHDD Results

	Placebo				Varenicline				Comparison
	n	M	SD	Exp. HDD	n	M	SD	Exp. HDD	ΔM	SE	P
Leaf 1	12	−0.27	0.36	19	19	−0.57	0.33	11	0.31	0.13	1
Leaf 2	38	−0.61	0.38	10	30	−0.26	0.32	20	−0.35	0.09	2
Leaf 3	12	−0.32	0.34	18	7	−0.51	0.30	13	0.20	0.16	1
Leaf 4	39	−0.42	0.39	15	42	−0.69	0.36	8	0.27	0.08	1

Note: PHDD = percent heavy drinking days; n = number of participants in each condition in each leaf; M = observed mean on proportion change in PHDD; SD = observed standard deviation on proportion change in PHDD; Exp. HDD = expected heavy drinking days, which is calculated assuming a baseline HDD of 26 out of 30; ΔM = observed mean difference in proportion change in PHDD between placebo and varenicline; SE = estimated standard error of the observed mean difference in proportion change in PHDD between placebo and varenicline; P1 = predicted greater reduction in PHDD among participants in varenicline condition than placebo; P2 = predicted greater reduction in PHDD among participants in placebo condition than varenicline.

Glinternet results

Primary outcome: PHDD

The cross-validation error for the model was 0.07, indicating that in the cross-validation process, the average squared error for out-of-sample prediction was 0.07 (or about 26% change for PHDD). The glinternet model for proportion change in PHDD highlighted several features that interacted with treatment arm to define treatment responders, including participant age, medication dose, craving, race, age of first drink, drinking goal after treatment, several AUD diagnostic criteria, and number of cigarettes smoked in the past week. The interaction of participant age and treatment arm was such that the effect of varenicline was strongest among older individuals, but across all observed ages, varenicline reduced PHDD more than placebo. Medication dose interacted with treatment arm such that individuals in the varenicline group had decreased PHDD compared with placebo overall, whereas individuals in the placebo group who took more doses of placebo had worse proportion change in PHDD. Race, summarized as White (n = 145), Black (n = 57), and more than one race/other race (n = 8), interacted with treatment arm such that individuals who identified as White and more than one race/other race had a beneficial response to varenicline in reducing PHDD; for individuals who identified as Black, there was no difference in PHDD between participants treated with varenicline versus placebo (see Fig. 2).

Fig. 2.

Group least absolute shrinkage and selection operator interaction net (glinternet) plots for percent heavy drinking days (PHDD). Features from the glinternet model for predicted percent change in PHDD with quantitative interactions are displayed. The predicted percent change in PHDD is on the vertical axis. On the horizontal axis are the categories for each predictor. Abst = abstinence; temp abst = temporary abstinence; alcohol taken in larger amounts or over longer periods of time was measured using Item 12c (“During the times when you drank alcohol, did you end up drinking more than you planned when you started?”) from the MINI Alcohol Dependence/Abuse Module. (a) Drinking goal interacted with treatment such that individuals who reported a harm-reduction goal or did not have a goal had better outcomes when treated with varenicline; for individuals who reported an abstinence or temporary abstinence goal, there was no difference in outcome when treated with varenicline versus placebo. (b) Race interacted with treatment arm such that individuals who identified as White and more than one race/other race had a beneficial response to varenicline in reducing PHDD; for individuals who identified as Black, there was no difference in PHDD between participants treated with varenicline versus placebo. (c) Endorsement of the alcohol use disorder symptom of drinking more than planned interacted with treatment arm such that participants who did not endorse this criterion had fewer PHDD when treated with varenicline compared with placebo.

Craving, as measured by the PACS, interacted with treatment arm such that the effect of varenicline was largest among participants with low craving scores, but across all observed craving levels, varenicline reduced PHDD more than placebo. Age of first drink interacted with treatment arm such that the beneficial effect of varenicline was greatest among individuals who had a younger age of first drink. Drinking goal, summarized as abstinence, temporary abstinence, controlled drinking, occasional drinking, and no goal, interacted with treatment arm such that individuals who reported a harm-reduction goal or did not have a goal had better PHDD outcomes when treated with varenicline; for individuals who reported an abstinence or temporary-abstinence goal, there was no difference in PHDD when treated with varenicline versus placebo (see Fig. 2). Finally, several AUD diagnostic criteria interacted with treatment arm. The first criterion was drinking more than planned, which interacted with treatment arm such that participants who did not endorse this criterion had reduced PHDD when treated with varenicline compared with placebo (see Fig. 2). The second criterion was spending less time on other activities because of drinking, which interacted with treatment arm such that participants who did not endorse this criterion had reduced PHDD when treated with varenicline compared with placebo. The third criterion was drinking despite negative consequences, which interacted with treatment arm such that participants who endorsed this criterion had fewer PHDD when treated with varenicline compared with placebo. Finally, the number of cigarettes smoked in the past week interacted with treatment arm such that as the number of cigarettes smoked increased, the benefit of varenicline over placebo on PHDD also increased (i.e., there was little benefit of varenicline when the number of cigarettes smoked was 0 and a large benefit of varenicline over placebo when the number of cigarettes smoked was 25). For plots of the qualitative interactions, see Figure 2. The results of the glinternet models for the secondary outcomes are available in the Supplemental Material.

For the multiple-imputation analysis of the glinternet model, Table S2 in the Supplemental Material provides a summary of the proportion of times each interaction with treatment arm was selected. Note that age and drinking goal were selected in 100% of the imputed data sets. Race and heart rate were selected in 98% and 97% of the imputed data sets, respectively. Cravings was selected in 49% of the imputed data sets, and no other variables were selected in more than 20% of the imputed data sets. Cross-validation error was similar in the imputed data sets to the listwise-deleted data set (range = 0.26–0.28, M = 0.27). The tuning parameter λ was also similar to the value selected in the listwise-deleted data set (range = 10–17, M = 12.16).

Discussion

This secondary analysis of a large, multisite clinical trial of varenicline for AUD sought to identify treatment responders using two distinct machine-learning methods: QUINT and glinternet. Both approaches found that older individuals and participants who smoked more as potential responders to varenicline had larger reductions in PHDD. This key finding is consistent with theory-based moderator analysis suggesting that smokers may respond better to varenicline for the treatment of AUD (Falk et al., 2015) and positive varenicline trials for drinking outcomes conducted in smoking samples (Fucito et al., 2011; Mitchell et al., 2012). The alignment between data-driven and theory-driven findings is key because the benefits of machine-learning models are predicated on their ability to generate generalizable results.

A more in-depth interpretation of the results from the QUINT models for the primary outcome (PHDD) showed that patients who were older at study onset (> 49.5 years) and treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age and treated with placebo. For younger patients (≤ 49.5 years), patients with a lower resting heart rate (≤ 63.5) or with a higher resting heart rate (> 63.5) who smoked more than 10.5 cigarettes per day on average over the past week and were treated with varenicline showed a benefit in reducing PHDD compared with patients of a similar age, health, and smoking profile treated with placebo. This finding suggests that not only smoking status and age but also a measure of cardiovascular health (resting heart rate) was necessary to identify treatment responders. The benefits of varenicline for these groups are clinically significant such that they resulted in an expected number of heavy drinking days ranging from eight to 13, which is a significant reduction from the average baseline of 26 heavy drinking days. Furthermore, younger patients with a higher resting heart rate who smoked fewer cigarettes per day, on average, experienced an iatrogenic effect of varenicline treatment. This was evidenced by higher PHDD in individuals treated with varenicline compared with placebo-treated individuals with similar profiles.

A particularly useful outcome of the QUINT models is the estimated standardized effect size in each leaf. In particular, this information could be used to make projections about future RCTs that target likely responding populations. For example, the original study found an effect size of d = 0.31. A researcher planning a second clinical trial aiming to have 90% power to detect this effect would need to collect at least 358 participants. Consider, instead, a clinical trial that recruited only participants older than 50, who have been identified as likely responders. The QUINT model estimates the effect size in this group to be d = 0.73, requiring only 66 participants to achieve the same power. This demonstrates how QUINT models could be used to identify likely responders and plan for future studies that select participants on the basis of likely responder status and reduce required sample sizes for high-powered clinical trials.

Results of the glinternet model for the primary outcome showed a number of nuanced effects. Note that alcohol craving, as measured by the PACS, interacted with treatment arm such that the effect of varenicline was largest among participants with low craving scores, yet across all observed craving levels, varenicline reduced PHDD more than placebo. This is relevant given that a recent study specifically probing the craving-reducing properties of varenicline versus placebo reported null effects for cue-induced craving (Miranda et al., 2020). In another nuanced finding, drinking goal interacted with treatment arm such that individuals who reported a harm-reduction goal or did not have a goal had better PHDD outcomes when treated with varenicline; for individuals who reported an abstinence or temporary-abstinence goal, there was no difference in PHDD when treated with varenicline compared with placebo. We conclude that varenicline may be especially useful for individuals with a controlled drinking goal, presenting an opportunity for personalized medicine. On a related note, not endorsing specific AUD diagnostic criteria (i.e., having fewer specific AUD symptoms) predicted better clinical response to varenicline, which is consistent with a recent study in which lower AUD severity was associated with better clinical response (Donato et al., 2021). Finally, the number of cigarettes smoked in the past week interacted with treatment arm such that as the number of cigarettes smoked increased, the benefit of varenicline over placebo on PHDD also increased (i.e., there was little benefit of varenicline when the number of cigarettes smoked was 0 and a large benefit of varenicline over placebo when the number of cigarettes smoked was 25). This finding underscores the relationship between smoking and drinking outcomes for varenicline treatment. A recent trial by our laboratory found that among heavy-drinking smokers, varenicline alone produced higher smoking-cessation rates than the combination of varenicline plus naltrexone (Ray et al., 2021).

Although it is important to highlight the moderators identified using the machine-learning approaches, it is also of interest to note consistency in variables that were not identified as treatment moderators using these approaches. Both the QUINT and glinternet models did not identify several variables as modifiers, including alcohol-related consequences from the ImBIBe questionnaire, family history of AUD, alcohol withdrawal, and sex. These results are consistent with the findings of Falk and colleagues (2015), who used a theory-based approach to moderator testing and also did not identify these variables as significant moderators of treatment response to varenicline. This study did not identify sex as a potential treatment modifier, which is of particular interest given that some studies have found sex differences in drinking outcomes in individuals with an AUD with comorbid smoking (Bold et al., 2019; O’Malley et al., 2018). Specifically, men treated with varenicline had better drinking outcomes during the trial (O’Malley et al., 2018) and after treatment (Bold et al., 2019) relative to placebo. The inconsistent findings related to sex as a treatment moderator may be due to study design. The study by O’Malley and colleagues (2018) specifically recruited individuals with AUD who were comorbid smokers, whereas the multisite study of varenicline recruited individuals with an AUD regardless of smoking status (Litten et al., 2013). The O’Malley et al. study also had a longer treatment duration and a more intensive behavioral intervention combined with pharmacotherapy treatment, and participants had more days abstinent and fewer heavy drinking days at baseline compared with the multisite trial. These differences in study design and participant characteristics may have resulted in the inconsistent findings related to sex as a treatment moderator. Both studies had substantially more male participants (≈70% male in both studies) than female participants; future studies should enroll equal numbers of males and females blocked with smoking status to fully explore sex differences to treatment with varenicline.

From a methodological viewpoint, in this study, we used two machine-learning approaches that have been suggested for accurately identifying treatment responders (Lipkovich et al., 2017). By using two methods, we were able to compare results from each; however, future studies may select one of these methods rather than using both. Each method has relative pros and cons that should be considered when selecting a method in the future. Previous simulation research suggests that both QUINT and glinternet models can perform well with reasonably small samples (e.g., N = 200; Dusseldorp et al., 2016; Dusseldorp & Van Mechelen, 2014; Wester et al., 2022). The QUINT models prioritize identifying qualitative interactions that are particularly important for identifying the desired subgroups. One particular advantage of the QUINT models is that there is a built-in step that evaluates whether to proceed with identification of treatment groups. This step, which initiates the analysis, is key to avoiding potential false positives. However, the QUINT models were highly sensitive to the outcome (the trees were different depending on which outcome variable was used), and interpreting the high-level interactions was very difficult such that our research team did not feel comfortable interpreting trees with more than four leaves. On the other hand, the glinternet models were relatively easy to interpret and translate to linear models, which psychologists tend to be more familiar with. The limitations of the glinternet model, however, are that it frequently identified variables that were not useful for differentiating treatment-response profiles (i.e., did not identify qualitative interactions) and that the fitted model was limited to two-way interactions so it would not necessarily identify high-level interactions. In addition, the glinternet model may be prone to Type I errors because there is no omnibus test at initiation of the modeling to evaluate whether it is reasonable to pursue identifying interactions. Importantly, the glinternet models were relatively consistent across outcomes, suggesting potentially more robust results. Overall, it appears the QUINT models are likely more suited to identifying treatment-response types because they prioritize qualitative interactions. However, researchers using QUINT models should consider limitations of the current software for handling missing data and the complexity of interpreting high-order interactions.

Although we recommend using QUINT models for future work on identifying treatment responders, it is important not to overinterpret the results of these models. In particular, the cut points used to create the trees (e.g., age > 49.5 years, heart rate > 63.5) are determined to optimize prediction; however, these points should not be interpreted as qualitative change points or piecewise transitions. It is possible for the response to varenicline to continuously increase or decrease with some continuous moderator, but still the QUINT models will always provide cutoffs within those continua. In addition, the cutoffs are specific to the current data and, just like any other statistic, are expected to have some sampling variability. Note that the age cutoff used in the PHDD model (49.5 years) was different than the cutoff in the DPD model (54.5 years); likely this is not due to some systematic difference between the two outcomes but, rather, some amount of sampling variability. Future research with larger and representative samples could be used to generate more reliable cutoffs. New methods that can provide some uncertainty around cutoff criteria would be beneficial in determining diagnostic decision rules. Although we believe the QUINT models likely do a good job of identifying an optimized cutoff given the data at hand, they do not provide an uncertainty estimate for that cutoff and so should be interpreted with caution. Thus, we have focused our interpretations on general directionality (e.g., older, younger) rather than the specific cutoff selected by the QUINT models.

In closing, in this study, we tested two machine-learning approaches to elucidating clinical response to varenicline versus placebo for the treatment of AUD. Results were generally consistent with theory-driven models and the extant literature by highlighting the role of smoking status (smokers responded better to varenicline), severity of AUD (lower severity associated with better response to varenicline), medication adherence, and drinking goal (individuals with controlled-drinking goal responded better to varenicline). However, other variables, such as sex, were not selected as moderators in these analyses. This study is the first to demonstrate the shared contributions of all of these variables simultaneously and in competition with a broad set of other potential moderators. This study also produced novel findings, including the interaction between age and cardiovascular health in predicting clinical response. This is noteworthy given that smoking and alcohol use have detrimental effects on cardiovascular-health outcomes (Ebbert et al., 2005). Participants with lower alcohol-craving levels responded better to varenicline versus placebo. This is relevant given a recent study that showed varenicline did not blunt cue-induced craving in individuals with AUD compared with placebo (Miranda et al., 2020). This study confirms established predictors of response to varenicline for AUD while also presenting new directions that are clinically useful. This study adds to existing literature suggesting that varenicline may be particularly useful for individuals who drink alcohol heavily and also smoke cigarettes. A novel contribution is the suggestion that varenicline may be recommended for individuals with a controlled-drinking goal and individuals with lower levels of subjective alcohol craving. Together, these findings have high potential for clinical dissemination because they elucidate key variables thought to identify responders and individuals for whom varenicline treatment may be iatrogenic. As the field continues to integrate machine-learning methods in personalized medicine, studies that can provide a nuanced methodological examination of machine-learning methods and effectively integrate them with the medication-development literature have the highest potential to inform clinical practice.

Supplemental Material

sj-doc-1-cpx-10.1177_21677026231169922 – Supplemental material for Identifying Treatment Responders to Varenicline for Alcohol Use Disorder Using Two Machine-Learning Approaches

Supplemental material, sj-doc-1-cpx-10.1177_21677026231169922 for Identifying Treatment Responders to Varenicline for Alcohol Use Disorder Using Two Machine-Learning Approaches by Erica N. Grodin, Amanda K. Montoya, Alondra Cruz, Suzanna Donato, Wave-Ananda Baskerville and Lara A. Ray in Clinical Psychological Science

Footnotes

Transparency

Action Editor: Tamika C. Zapolski

Editor: Jennifer L. Tackett

Author Contributions

Erica N. Grodin: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Writing – original draft; Writing – review & editing.

Amanda K. Montoya: Conceptualization; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Validation; Visualization; Writing – original draft; Writing – review & editing.

Alondra Cruz: Formal analysis; Methodology; Visualization; Writing – review & editing.

Suzanna Donato: Data curation; Project administration; Writing – review & editing.

Wave-Ananda Baskerville: Data curation; Writing – review & editing.

Lara A. Ray: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Supervision; Writing – original draft; Writing – review & editing.

ORCID iDs

Erica N. Grodin

Amanda K. Montoya

Lara A. Ray

Supplemental Material

Additional supporting information can be found at

References

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). APA.

Bakin

(1999). Adaptive regression and model selection in data mining problems [Unpublished doctoral dissertation]. Australian National University.

Ballarini

N. M.

Rosenkranz

G. K.

Jaki

König

Posch

(2018). Subgroup identification in clinical trials via the predicted individual treatment effect. PLOS ONE, 13(10), Article e0205971. https://doi.org/10.1371/journal.pone.0205971

Bold

K. W.

Zweben

Fucito

L. M.

Piepmeier

M. E.

Muvvala

Gueorguieva

O’Malley

S. S.

(2019). Longitudinal findings from a randomized clinical trial of varenicline for alcohol use disorder with comorbid cigarette smoking. Alcoholism: Clinical and Experimental Research, 43(5), 937–944. https://doi.org/10.1111/acer.13994

Bzdok

Meyer-Lindenberg

(2018). Machine learning for precision psychiatry: Opportunities and challenges. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(3), 223–230. https://doi.org/10.1016/j.bpsc.2017.11.007

de Bejczy

Löf

Walther

Guterstam

Hammarberg

Asanovska

Franck

Isaksson

Söderpalm

. (2015). Varenicline for treatment of alcohol dependence: A randomized, placebo-controlled trial. Alcoholism: Clinical and Experimental Research, 39(11), 2189–2199. https://doi.org/10.1111/acer.12854

Donato

Green

Ray

L. A.

(2021). Alcohol use disorder severity moderates clinical response to varenicline. Alcoholism: Clinical and Experimental Research, 45, 1877–1887. https://doi.org/10.1111/acer.14674

Dusseldorp

Conversano

Van Os

B. J

. (2010). Combining an additive and tree-based regression model simultaneously: STIMA. Journal of Computational and Graphical Statistics, 19(3), 514–530.

Dusseldorp

Doove

van Mechelen

(2016). Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them. Behavior Research Methods, 48(2), 650–663. https://doi.org/10.3758/s13428-015-0594-z

10.

Dusseldorp

van Mechelen

(2014). Qualitative interaction trees: A tool to identify qualitative treatment–subgroup interactions. Statistics in Medicine, 33(2), 219–237.

11.

Ebbert

J. O.

Janney

C. A.

Sellers

T. A.

Folsom

A. R.

Cerhan

J. R.

(2005). The association of alcohol consumption with coronary heart disease mortality and cancer incidence varies by smoking history. Journal of General Internal Medicine, 20(1), 14–20. https://doi.org/10.1111/j.1525-1497.2005.40129.x

12.

Enders

C. K.

Keller

B. T.

(2019). Blimp technical appendix: Fully Bayesian model-based estimation and imputation for multilevel models. www.appliedmissingdata.com/blimp-papers

13.

Falk

D. E.

Castle

I. J. P.

Ryan

Fertig

Litten

R. Z.

(2015). Moderators of varenicline treatment effects in a double-blind, placebo-controlled trial for alcohol dependence: An exploratory analysis. Journal of Addiction Medicine, 9(4), 296–303. https://doi.org/10.1097/ADM.0000000000000133

14.

Falk

D. E.

Ryan

M. L.

Fertig

J. B.

Devine

E. G.

Cruz

Brown

E. S.

Burns

Salloum

I. M.

Newport

D. J.

Mendelson

Galloway

Kampman

Brooks

Green

A. I.

Brunette

M. F.

Rosenthal

R. N.

Dunn

K. E.

Strain

E. C.

Ray

, . . . National Institute on Alcohol Abuse and Alcoholism Clinical Investigations Group (NCIG) Study Group. (2019). Gabapentin enacarbil extended-release for alcohol use disorder: A randomized, double-blind, placebo-controlled, multisite trial assessing efficacy and safety. Alcoholism: Clinical and Experimental Research, 43(1), 158–169. https://doi.org/10.1111/acer.13917

15.

Flannery

Volpicelli

Pettinati

(1999). Psychometric properties of the Penn alcohol craving scale. Alcoholism: Clinical and Experimental Research, 23(8), 1289–1295.

16.

Formanoy

M. A. G.

Dusseldorp

Coffeng

J. K.

Van Mechelen

Boot

C. R. L.

Hendriksen

I. J. M.

Tak

E. C. P. M

. (2016). Physical activity and relaxation in the work setting to reduce the need for recovery: What works for whom? BMC Public Health, 16(1), Article 866. https://doi.org/10.1186/s12889-016-3457-3

17.

Foster

J. C.

Taylor

J. M.

Ruberg

S. J.

(2011). Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30(24), 2867–2880.

18.

Friedman

Hastie

Tibshirani

(2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 432–441.

19.

Fucito

L. M.

Toll

B. A.

Romano

D. M.

Tek

O’Malley

S. S.

(2011). A preliminary investigation of varenicline for heavy drinking smokers. Psychopharmacology, 215(4), 655–663. https://doi.org/10.1007/s00213-010-2160-9

20.

Gueorguieva

Fucito

L. M.

O’Malley

S. S.

(2015). Predictors of abstinence from heavy drinking during follow-up in COMBINE. Journal of Studies on Alcohol and Drugs, 76(6), 935–941. https://doi.org/10.15288/jsad.2015.76.935

21.

Gunn

H. J.

Hayati Rezvan

Fernández

M. I.

Comulada

W. S.

(2022). How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion. Psychological Methods. Advance online publication. https://doi.org/10.1037/met0000478

22.

Heatherton

T. F.

Kozlowski

L. T.

Frecker

R. C.

Fagerstrom

K. O.

(1991). The Fagerström test for nicotine dependence: A revision of the Fagerstrom Tolerance Questionnaire. British Journal of Addiction, 86(9), 1119–1127.

23.

Laska

E. M.

Siegel

C. E.

Lin

Bogenschutz

Marmar

C. R.

(2020). Gabapentin enacarbil extended-release versus placebo: A likely responder reanalysis of a randomized clinical trial. Alcoholism: Clinical and Experimental Research, 44(9), 1875–1884. https://doi.org/https://doi.org/10.1111/acer.14414

24.

Lê

A. D.

Corrigall

W. A.

Harding

J. W.

Juzytsch

T. K.

(2000). Involvement of nicotinic receptors in alcohol self-administration. Alcoholism: Clinical and Experimental Research, 24(2), 155–163. https://doi.org/10.1111/j.1530-0277.2000.tb04585.x

25.

Lim

Hastie

(2015). Learning interactions via hierarchical group-lasso regularization. Journal of Computational and Graphical Statistics, 24(3), 627–654.

26.

Lipkovich

Dmitrienko

D’Agostino

R. B.

Sr . (2017). Tutorial in biostatistics: Data-driven subgroup identification and analysis in clinical trials. Statistics in Medicine, 36(1), 136–196. https://doi.org/https://doi.org/10.1002/sim.7064

27.

Litten

R. Z.

Ryan

M. L.

Fertig

J. B.

Falk

D. E.

Johnson

Dunn

K. E.

Green

A. I.

Pettinati

H. M.

Ciraulo

D. A.

Sarid-Segal

Kampman

Brunette

M. F.

Strain

E. C.

Tiouririne

N. A.

Ransom

Scott

Stout

, & NCIG (National Institute on Alcohol Abuse and Alcoholism Clinical Investigations Group) Study Group. (2013). A double-blind, placebo-controlled trial assessing the efficacy of varenicline tartrate for alcohol dependence. Journal of Addiction Medicine, 7(4), 277–286. https://doi.org/10.1097/ADM.0b013e31829623f4

28.

Mann

R. E.

Sobell

L. C.

Sobell

M. B.

Pavan

(1985). Reliability of a family tree questionnaire for assessing family history of alcohol problems. Drug Alcohol Depend, 15(1–2), 61–67. https://doi.org/10.1016/0376-8716(85)90030-4

29.

Maruo

Furukawa

T. A.

Noma

Imai

Ikeda

Yamawaki

(2020). Qualitative treatment-subgroup interactions in the antidepressant treatment of major depression: Application of QUINT to individual participant data from seven placebo-controlled randomized controlled trials. Personalized Medicine in Psychiatry, 21, Article 100054. https://doi.org/10.1016/j.pmip.2019.100054

30.

Miller

W. R.

(1995). The Drinker Inventory of Consequences (DrInC): An instrument for assessing adverse consequences of alcohol abuse: Test manual. U.S. Department of Health and Human Services, Public Health Service.

31.

Miller

W. R.

(1996). Form 90: A structured assessment interview for drinking and related behaviors: Test manual [Monograph]. https://doi.org/10.1037/e563242012-001

32.

Miranda

Jr. O’Malley

S. S.

Treloar Padovano

Falk

D. E.

Ryan

M. L.

Fertig

J. B.

Chun

T. H.

Muvvala

S. B.

Litten

R. Z.

(2020). Effects of alcohol cue reactivity on subsequent treatment outcomes among treatment-seeking individuals with alcohol use disorder: A multisite randomized, double-blind, placebo-controlled clinical trial of varenicline. Alcoholism: Clinical and Experimental Research, 44(7), 1431–1443.

33.

Mitchell

J. M.

Teague

C. H.

Kayser

A. S.

Bartlett

S. E.

Fields

H. L.

(2012). Varenicline decreases alcohol consumption in heavy-drinking smokers. Psychopharmacology, 223(3), 299–306.

34.

O’Malley

S. S.

Zweben

Fucito

L. M.

Piepmeier

M. E.

Ockert

D. M.

Bold

K. W.

Petrakis

Muvvala

Jatlow

Gueorguieva

(2018). Effect of varenicline combined with medical management on alcohol use disorder with comorbid cigarette smoking: A randomized clinical trial. JAMA Psychiatry, 75(2), 129–138. https://doi.org/10.1001/jamapsychiatry.2017.3544

35.

Plebani

J. G.

Lynch

K. G.

Rennert

Pettinati

H. M.

O’Brien

C. P.

Kampman

K. M.

(2013). Results from a pilot clinical trial of varenicline for the treatment of alcohol dependence. Drug and Alcohol Dependence, 133(2), 754–758. https://doi.org/10.1016/j.drugalcdep.2013.06.019

36.

Ray

L. A.

Green

Enders

Leventhal

A. M.

Grodin

E. N.

Hartwell

Venegas

Meredith

Nieto

S. J.

Shoptaw

Miotto

(2021). Efficacy of combining varenicline and naltrexone for smoking cessation and drinking reduction: A randomized clinical trial. American Journal of Psychiatry, 178(9), 818–828. https://doi.org/10.1176/appi.ajp.2020.20070993

37.

Sheehan

D. V.

Lecrubier

Sheehan

K. H.

Amorim

Janavs

Weiller

Hergueta

Baker

Dunbar

G. C.

(1998). The Mini-International Neuropsychiatric Interview (MINI): The development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. Journal of Clinical Psychiatry, 59(20), 22–33.

38.

Sobell

L. C.

Sobell

M. B.

(1992). Timeline follow-back: A technique for assessing self-reported alcohol consumption. In Litten

R. Z.

Allen

J. P.

(Eds.), Measuring alcohol consumption: Psychosocial and biochemical methods (pp. 41–72). Humana Press/Springer Nature.

39.

Tsai

C. L.

Wang

Nickerson

D. M.

(2009). Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10, 141–58.

40.

Zhou

Yan

Fan

Yang

(2008). Interaction trees with censored survival data. The International Journal of Biostatistics, 4(1), Article 2. https://doi.org/10.2202/1557-4679.1071

41.

Sullivan

J. T.

Sykora

Schneiderman

Naranjo

C. A.

Sellers

E. M.

(1989). Assessment of alcohol withdrawal: The revised Clinical Institute Withdrawal Assessment for Alcohol Scale (CIWA-Ar). British Journal of Addiction, 84(11), 1353–1357.

42.

Szabo

(1996). The World Health Organisation Quality of life (WHOQOL) assesment instrument. In Spilker

(Ed.), Quality of life and pharmacoeconomics in clinical trials (pp. 355–362). Lippincott Williams & Wilkins

43.

Twala

B. E.

Jones

Hand

D. J.

(2008). Good methods for coping with missing data in decision trees. Pattern Recognition Letters, 29(7), 950–956.

44.

van Luenen

Kraaij

Spinhoven

Dusseldorp

Garnefski

. (2020). Moderators of the effect of guided online self-help for people with HIV and depressive symptoms. AIDS Care, 32(8), 942–948. https://doi.org/10.1080/09540121.2019.1679703

45.

Werner

Rentz

Frank

Bowman

Duhig

Moss

(2008, June). Participant consequence measures. Annual Meeting of the Research Society on Alcoholism, Washington, DC, United States.

46.

Wester

R. A.

Rubel

Mayer

(2022). Covariate selection for estimating individual treatment effects in psychotherapy research: A simulation study and empirical example. Clinical Psychological Science, 10, 920–940. https://doi.org/10.1177/21677026211071043

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.64 MB