Abstract
Effect modification occurs while the effect of the treatment is not homogeneous across the different strata of patient characteristics. When the effect of treatment may vary from individual to individual, precision medicine can be improved by identifying patient covariates to estimate the size and direction of the effect at the individual level. However, this task is statistically challenging and typically requires large amounts of data. Investigators may be interested in using the individual patient data from multiple studies to estimate these treatment effect models. Our data arise from a systematic review of observational studies contrasting different treatments for multidrug-resistant tuberculosis, where multiple antimicrobial agents are taken concurrently to cure the infection. We propose a marginal structural model for effect modification by different patient characteristics and co-medications in a meta-analysis of observational individual patient data. We develop, evaluate, and apply a targeted maximum likelihood estimator for the doubly robust estimation of the parameters of the proposed marginal structural model in this context. In particular, we allow for differential availability of treatments across studies, measured confounding within and across studies, and random effects by study.
Keywords
Introduction
Multidrug-resistant tuberculosis (MDR-TB), a form of tuberculosis (TB) with high mortality, is caused by bacteria resistant to at least the two most effective anti-TB drugs, isoniazid and rifampicin. According to the World Health Organization (WHO) (2020), a global total of 206,030 patients with MDR-TB or rifampicin-resistant (RR)-TB were detected and notified of their infection in 2019, a
Due to the large data requirements for estimating effects across patient subgroups, investigators may be interested in using individual participant data (IPD) from multiple studies to fit these treatment effect models. In our study, the data were extracted from 31 observational studies 11 which contrasted different treatment regimens for patients with MDR-TB, where multiple antimicrobial agents are taken concurrently by a patient over a long period. Our objective is to perform an IPD meta-analysis 12 to investigate the impact of different patient and treatment characteristics on the average treatment effect (ATE) of 14 anti-TB medications.
In this project, we propose and evaluate a targeted maximum likelihood estimator (TMLE) 13 to model effect modification for the estimation of the parameters of the CATE MSM, in the IPD meta-analytical context. In Supplemental Materials, we also describe and evaluate a novel augmented inverse probability of treatment weighted estimator (A-IPTW). For estimation of the ATE, TMLE depends on two components: an outcome regression conditional on treatment and covariates; and, weights comprised of the inverse of the propensity score where the propensity score is the probability of treatment conditional on covariates. 14 TMLEs are plug in estimators with asymptotic properties that use a targeting step to optimize the bias-variance tradeoff for the target parameter. In our setting, the estimator we propose allows for differential availability of treatments across studies and random effects by study due to measured and unmeasured characteristics of the study-specific populations.
In the section “Pooled observational studies of MDR-TB,” we describe the MDR-TB data structure and our parameters of interest. The section “Models and algorithms” introduces the TMLE procedure. We also describe a clustered influence function-based variance estimator. In the section “Simulation study,” we present the results of simulation studies to demonstrate the properties of the estimator under different scenarios. Then in the section “MDR-TB data analysis,” we provide the results based on TMLE analysis of effect modification for 14 anti-TB medications using the combined IPD of the 31 observational studies.
Pooled observational studies of MDR-TB
The application data consist of IPD derived from 31 observational studies resulting in a total of 9290 MDR-TB patients. The data are available to any of the data contributors, but not publicly available. The systematic review 11 extended three previous systematic reviews.15–17 These IPD were collected from cohorts of adults with study years ranging from 1995 to 2009. The information collected for each patient includes demographics (age and sex), past TB history, clinical characteristics (pre-treatment sputum smear results for acid-fast bacilli (AFB) and culture, chest radiography, HIV infection), drug susceptibility test (DST) results, anti-microbial medications given, and outcomes. Our one-stage analytical approach pools all IPD across studies, accounting for clustering and random effects.
Data structure
Outcome
In the pooled dataset, the binary outcome
Treatments and treatment availabilities
There are 14 antimicrobial agents observed in the data: ethambutol (EMB), ethionamide (ETO), ofloxacin (OFX), pyrazinamide (PZA), kanamycin/amikacin (KM/AM), cycloserine (CS), capreomycin (CAP), para-aminosalicylic acid (PAS), prothionamide (PTO), streptomycin (SM), ciprofloxacin (CIP), later-generation fluoroquinolones (LgFQ), rifabutin (RIF), and group five level drugs (Gp5). LgFQ included levofloxacin, moxifloxacin, gatifloxacin, and sparfloxacin.
11
Gp5 comprised of amoxicillin-clavulanate, macrolides (azithromycin, roxithromycin, and clarithromycin), clofazimine, thiacetazone, imipenem, linezolid, high dose isoniazid, and thioridazine.
11
We use
Not all treatments are observed in each study; we assume that a given treatment was available for a given study’s participants based on whether we have observed that this treatment was taken by any patient in the given study.
19
The binary variable
Baseline covariates and resistance information
The baseline covariates consist of two study level covariates
Resistance information based on DST is defined as the binary variable
Observed data structure
The observed data can be written as
Definition of counterfactual notation
In this data, there is differential availability of treatments across studies. In order to define a generalized parameter of interest, we define counterfactual notation under the availability of a given treatment.
19
We define counterfactual exposure to treatment
Parameter of interest and assumptions
As in previous work, we aim to estimate the parameters of interest defined with respect to a global population which refers to the union of super-populations specific to each study in this dataset.20,19 We define a non-parametric structural equation model (NPSEM), which assumes a time ordered data generating structure, in Supplemental Materials Appendix A. Then the counterfactual likelihood is derived under NPSEM and the parameter of interest is defined with respect to this likelihood.
In this project, for a given medication
Identifying the parameter of interest requires some assumptions that allow us to write the parameter in terms of distributions of the observed data.
19
Under these assumptions, the CATE can be written as follows:
Models and algorithms
Outcome and propensity score models
For ease of notation, we will drop the notation
In our setting with treatment availability variable
For the propensity score, the probability of being treated for each patient is
Efficient influence function
The efficient influence function (EIF) for a specific parameter is the influence function that achieves the efficiency in the given space of semi-parametric models.
21
The EIF defines the linear approximation of any efficient and regular asymptotically linear estimator. The EIF for the coefficients
Targeted maximum likelihood estimator
The general TMLE procedure was proposed by van der Laan and Rubin.
23
Our proposed procedure takes estimates of the conditional expected outcome
The first step is to produce initial estimates for
We note that this TMLE solves the equation
Influence function-based variance estimator
Standard errors can be estimated for the TMLE using a large-sample sandwich estimator of the efficient influence function under consistency of
Under regularity conditions, we can write the linear approximation of the estimator as
19
In order to estimate the variance of the parameter of interest while taking clustering by studies into account, we only assume independence between studies, and not individuals within the same study. Within study
Simulation study
We conducted simulation studies in R to develop and evaluate TMLE and A-IPTW in the implementation to model effect modification for IPD meta-analysis. We demonstrate the double robustness and finite-sample performance of TMLE and A-IPTW. Results for the A-IPTW estimator are provided in Supplemental Materials Appendix E.
Data generation
For each dataset, we generated two continuous study-level covariates,
In this simulation study, we only aimed to estimate the parameters representing effect modification of treatment
Analysis
In order to model the effect modification of treatment
As we discussed in the section “Parameter of interest and assumptions,” we model the CATE of treatment
We used logistic regressions to estimate each component of
In this simulation, we applied the TMLE algorithms presented in the section “Targeted maximum likelihood estimator.” Note that the TMLE implementation we used requires that we transform the continuous
Results
Figure 1 presents the results of TMLE for the estimation of five potential effect modifiers under the four estimation scenarios and three different sample sizes with outcomes generated without random effects. Figure 2 presents the TMLE results under random effects. The coverage rates are presented in the blue boxes of the two figures. More detailed TMLE results are provided in Supplemental Materials Appendix E (Tables S2 to S4), along with the figures and tables of the results of the A-IPTW estimator (Tables S5 to S7 and Figures S1, S2).

Error of TMLE estimates under four scenarios and three different sample sizes without random effects. The

Error of TMLE estimates under four scenarios and three different sample sizes with random effects. The
From Figures 1 and 2, we see that under the first two scenarios, where the model for
The coverage rates, given in the blue boxes of Figures 1 and 2, typically increased with the number of studies. For instance, with random effects with 10 studies, in scenario 1 where all models were correctly specified, the coverage rates for the five potential effect modifiers were between
MDR-TB data analysis
Descriptive statistics of MDR-TB
The combined dataset contained the IPD from 31 observational studies with a total of 9290 patients. After removing 260 (
In order to assess the data support needed to investigate effect modification by concurrent medication, Table 1 displays the number of patients who used a combination of any two medications, with the diagonal indicating the total number of patients taking the corresponding medication. Values ranged between 87 and 4574, indicating at least minimal data support for all pairwise combinations.
Summary of number of patients taking any combinations for any two medications during the treatment period. The diagonal values represent the total number of patients taking each medication.
EMB: ethambutol; ETO: ethionamide; OFX: ofloxacin; PZA: pyrazinamide; KM/AM: kanamycin/amikacin; CS: cycloserine; CAP: capreomycin; PAS: para-aminosalicylic acid; PTO: prothionamide; SM: streptomycin; CIP: ciprofloxacin; LgFQ: later-generation fluoroquinolones; RIF: rifabutin; Gp5: group five level drugs.
Analysis and results of MDR-TB
For each medication
As noted, there are missing values in the individual-level covariates. We used multiple imputation
28
by chained equations with the
In each completed dataset, variance estimates for the coefficients were computed using the sample variance of the influence function following the expression in the section “Influence function-based variance estimator.” Finally, since many comparisons made in this analysis, we performed a multiple testing adjustment of the “significance level” of the p-values to control the false discovery rate via the method of Benjamini and Hochberg 35 with further details given in Supplemental Materials Appendix G.
Figures 3 to 5 show the estimated coefficients, standard errors, and

Estimated coefficients and the corresponding

Estimated coefficients of potential effect modifiers and the corresponding

Estimated coefficients of potential effect modifiers and the corresponding
The empirical distributions of the untruncated propensity scores for all drugs are provided in Supplemental Materials Appendix F (Table S13 and Figure S4). For later-generation fluoroquinolones, we noted very large weights which likely yielded the large variability observed in Figure 5, LgFQ plot. In addition, since fewer patients were prescribed later-generation fluoroquinolones among those who were HIV positive (Appendix F Figure S3), the standard errors were inflated for the coefficient of HIV in the fluoroquinolones MSM (Figure 3, LgFQ plot). Also, among those who took rifabutin, only 87/866 subjects were also prescribed later-generation fluoroquinolones, giving rise to the large standard error (Figure 5, RIF plot).
To highlight the differences between meta-analysis and individual study results, we compared the results of Mitnick et al., conducted in 1996–200236,37 for the effect modification of ethionamide. Given that all 14 medications were observed in only two studies (Appendix F Table S9), Mitnick et al. (710 subjects) and Tupasi et al. (170 subjects), we chose Mitnick et al. to make the comparison. In Figure 6, we see that a single-study TMLE analysis of the data from the study by Mitnick et al. concluded that prothionamide was associated with greater estimated effects of ethionamide while group 5 drugs and rifabutin were associated with lower estimated effects. Both the meta-analysis and the single study results agreed that cycloserine and capreomycin were effect modifiers (positive and negative, respectively). As expected, the estimation in the meta-analysis had lower standard errors than the individual study (Figure 6). The deviation of the results suggests heterogeneity between different studies in the meta-analysis. It is important to note that analysis of an individual study targets a parameter that is interpreted in the individual study’s super-population while the meta-analysis targets the parameter interpreted in the global population. These parameters may not coincide when there is heterogeneity.

Estimated coefficients of potential effect modifiers and the corresponding
Discussion
In this paper, we developed one-stage doubly-robust methods for the analysis of baseline effect modification in an IPD meta-analysis. The model space that we considered was nonparametric though the parameters of interest were defined through a working linear MSM for the CATE. Our approach allowed us to analyze pooled IPD from multiple studies in order to evaluate how estimated treatment effects may vary depending on the values of patient covariates. Our past work, which proposed related methods and a TMLE for IPD meta-analysis with multiple treatments, instead estimated a treatment importance metric defined as the difference in adjusted probabilities of treatment success between the patients who used each medication and the overall population. 19 Our current work proposes a novel TMLE and A-IPTW for the estimation of effect modification in the described marginal structural model in an IPD meta-analysis.
Vo et al. 38 illustrated that in IPD meta-analysis, heterogeneity across studies can come from two sources: case-mix heterogeneity, due to effect modification, and beyond case-mix heterogeneity, due to differences in study design and measurement. Our methods address heterogeneity by allowing for differential availability of treatments across studies and random effects by study due to measured and unmeasured characteristics of the study-specific populations.
In clinical and epidemiological research, model misspecification is always a concern when estimating treatment or exposure effects. Doubly robust methods yield consistent estimators even under misspecification of either the treatment or the outcome model. In the simulation study, we demonstrated the double robustness property of both TMLE and A-IPTW. We observed similar performance of TMLE and A-IPTW but we did not investigate near-positivity violations or other scenarios that may differentiate them in finite samples as have others.25,39 In addition, we demonstrated the double robustness of both methods when there are study-specific random effects for the outcome. Finally, we showed that the proposed confidence intervals, estimated using the influence function sandwich estimator that considers clustering by study, performed well when there were greater than 30 studies in the analysis. Indeed, a limitation of our approach is that it relies on a sufficient number of studies to estimate the generalized parameter interpreted in the global population. In particular, the ability to adjust for confounding by treatment availability depends on fitting a model for treatment availability conditional on study-level covariates, which is limited by the typically small number of studies in a meta-analysis. Indeed, we observe in the simulation study that error may persist when the number of studies is small and the outcome regression model is incorrectly specified, even when the propensity score model components are correctly specified with parametric models. Therefore, our approach should only be undertaken when a larger number of studies are available.
The treatment of MDR-TB is challenging because of its prolonged duration, toxicity, costs and unsatisfactory outcomes. 1 Second-line TB medicines used for the treatment of drug-resistant TB include injectable drugs (capreomycin, streptomycin, kanamycin, or amikacin), fluoroquinolones (ciprofloxacin, ofloxacin, or later-generation fluoroquinolones), ethionamide, or prothionamide, para-aminosalicyclic acid, and cycloserine. The WHO (2019) revised the guidelines and currently suggest that streptomycin and amikacin are to be considered only if DST results confirm susceptibility and adequate measures to monitor for adverse reactions can be ensured. In addition, capreomycin and kanamycin are not to be included in the treatment of MDR-TB patients. 4 Currently, in general, the selection of antimicrobials for the treatment of an individual patient with TB is based, apart from drug availability, on issues such as the mycobacteria’s drug-resistance pattern and assumptions about the chosen drugs (mechanism of action, potential toxicity, known pharmacological interactions, possible development of drug resistance due to previous use, etc.). However, the actual role played by each drug in the therapeutic outcome is difficult to assess because of the large number of possible combinations of antimicrobials and the potential yet undiscovered interactions affecting their pharmacokinetic and/or pharmacodynamic properties. The statistical approach proposed by the present study takes advantage of the IPD meta-analysis in order to unveil treatment effect modifiers, either regarding medications or other variables at the individual-level, with the aim to better treat and understand not only TB, but also other diseases, including those requiring the concurrent use of several drugs such as arterial hypertension, diabetes, etc.
As discussed in section “Parameter of interest and assumptions”, we require unconfoundedness when estimating the causal effect. However, in practice, this condition is non-testable and we are limited to the covariates collected by the original studies. While most important confounders were collected across all studies, we did not receive DST results from many studies. However, the results were available to clinicians who may have responsively changed or added medications, making DST results an important confounder. Our analyzes adjusted for all available drug sensitivity information.
Our results suggest that cycloserine may enhance the effects of ethionamide but capreomycin and kanamycin or amikacin were associated with reduced effects. In addition, streptomycin may reduce the effect of cycloserine. Our findings support the revised WHO guidelines that injectable drugs are not as beneficial as previously believed. The analyzes informing the 2019 WHO guidelines (Ahmad et al.) showed that kanamycin and capreomycin were associated with worse outcomes. 40 In our results, these drugs were associated with worse outcomes in regimes with ethionamide. For the six individual characteristics, age, sex, acid fast bacilli status, HIV infection, cavitation status on chest radiography, and past TB history, they might be potential for effect modifiers since we did not find any evidence of effect modification. Being that MDR-TB an infectious disease, this finding may simply reflect that antimicrobial drugs play a much larger and definitive role for success than other characteristics at the individual level.
Since our MDR-TB data were identified from studies carried out up to 2009, we have no information about both new and repurposed effective anti-TB drugs.41,42 Upcoming work will apply our methods to data from more recently treated patients. Another limitation of our approach is that, because we only considered the effect of intervening on one treatment at a time, we cannot directly address how to select combinations of medications that would be expected to optimize the probability of treatment success. Previous work evaluated the causal contrasts between different regimens of concurrent medications in MDR-TB. 43 Future applications should directly address the more challenging question of treatment-treatment interactions on the outcome which would directly allow for the evaluation of optimal medication usage. Other ongoing work in our group involves using LASSO, 33 rather than hypothesis testing, to select the effect modifiers in the linear MSM for the CATE. This may improve upon the current work by utilizing a superior approach to variable selection.
Identifying effect modifiers is an important step for estimating subpopulation causal effects that can help guide treatment decision making for individual patients. However, such analyzes require larger amounts of data than the estimation of average treatment effects. This analytic approach can generate hypotheses for drug combinations which can be tested in randomized controlled trials. We have also contributed by extending existing doubly robust methods to incorporate multiple data sources. Advances in IPD meta-analysis enable researchers to incorporate multiple sources of previously collected observational data in their analyzes in order to increase their power, which is greatly beneficial for the identification of effect modifiers.
Supplemental Material
sj-tex-1-smm-10.1177_09622802211046383 - Supplemental material for Modeling treatment effect modification in multidrug-resistant tuberculosis in an individual patientdata meta-analysis
Supplemental material, sj-tex-1-smm-10.1177_09622802211046383 for Modeling treatment effect modification in multidrug-resistant tuberculosis in an individual patientdata meta-analysis by Yan Liu, Mireille E Schnitzer, Guanbo Wang, Edward Kennedy, Piret Viiklepp, Mario H Vargas, Giovanni Sotgiu, Dick Menzies and Andrea Benedetti in Statistical Methods in Medical Research
Footnotes
Acknowledgments
This work was supported by the Canadian Institutes of Health Research [Project Grant to MES and AB, and a New Investigators Salary Award to MES]. The data and context were provided by the Collaborative Group for Meta-Analysis of Individual Patient Data in Multidrug-Resistant Tuberculosis.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
