17th Annual University of Pennsylvania Conference on Statistical Issues in Clinical Trials—Covariate Adjustment in Randomized Clinical Trials: New Methods and Applications (Morning Panel Discussion)

Abstract

DEVAN MEHROTRA: Our panelist is Courtney Schiffman from Genentech. Dr. Schiffman graduated from UC Berkeley with a PhD in Biostatistics in 2019. She has worked as a statistician at Genentech Roche on several Phase III clinical trials for respiratory and gastrointestinal indications. In addition to working as a study statistician, she’s actively working with colleagues at Genentech Roche on methods development and on how to help make covariate adjustment more widely used by study statisticians, by preparing presentations and public blog posts on how to practically and rigorously implement covariate adjustment in clinical trials.

COURTNEY SCHIFFMAN: Thank you so much for inviting me to be a part of this panel today and for the great introduction. Michael Rosenblum^† and I wanted to ask some questions and make comments focused particularly on practical considerations that study statisticians are keenly interested in when applying covariate adjustments. My comments and questions today will be very much in that vein: What are practical considerations that study statisticians are keenly interested in?My first question I’ll preface with a comment that we recently received feedback from European Clinical Trials Registry (EUCTR) indicating that we had to clearly state the covariates to be included in our working regression model for our G-computation standardization estimator. And we had to provide justification for including those covariates in our working regression model. So, my question to the panel is as follows:What justification or support do we need to be including for decisions made around the form of the working regression model?

And this could include the number of covariates to adjust for, whether or not we include treatment by covariate interactions, and which covariates we adjust for.

STUART POCOCK: My immediate thought is life’s not fair in this sense that we never know enough about the covariates at the time. Even though we may have done lots of trials before, we still don’t really know in advance the right covariates to put in because they’re prognostic and what’s the shape of their relationship with the prognostic outcome. We’re always struggling to specify well enough what the covariate model should be. And therefore, for the regulators to say you must have it all pinned down in advance precisely is what I mean by not fair. But then otherwise, another idea that someone was putting to me—and I forget who it was, and they’ve written about it—is on the blinded trial data, you could produce a prognostic model. You haven’t broken the treatment code, and you determine which are the predictors within your own trial.

And so you then have still used the data pre-specified in a modeling context. Are you allowed to do that? And my immediate response—I think it was Tim Morris in London who was saying this to me last week, now I come to think of it. And my immediate response was, “You’ll never get away with that.” But his response was we should be allowed to do that, and then you’d have the chance to see what predicts and then apply it to the treatment effect. I don’t know whether there’s any mileage in that.

KELLY VAN LANCKER: Does it work? Yes. I agree with the proposal to select the model in a blinded way. This would not harm inference. At the moment, in causal inference, it’s possible to prove that, especially in randomized trials using methods like G-computation (as I showed today) that you can do variable selection, using methods like Lasso or stepwise, or even more advanced machine learning approaches, like Laura will be talking about later. You can do that without an increase in Type I error, even without blinding. This all has to do with the fact that these methods are what we call doubly robust.¹ In randomized trials, this also leads to valid inference. From that perspective, it is not necessary to pre-define everything in advance, because let’s be honest, that’s an unfair task, a really difficult one.²

I want to mention a situation in consulting where the team really wanted to do covariate adjustment. Perhaps, it was a new disease area for that company. And they said, “We want to do covariate adjustment.” Then in the end, they didn’t do it because they didn’t know in advance which covariates would be important, which would be prognostic. And I think that’s sad because in the end they didn’t have the power gain or the efficiency gain. So I’m quite pro doing the selection in your trial. And I have seen people doing it in a blinded way. But I think it’s not even necessary with some methods augmented inverse probability weighting or G-computation with Lasso to do it in a blinded way. So, I would then advocate to even not do it blinded—do it unblinded. Although that’s completely against what’s in the guidance at the moment.

STUART POCOCK: So we need to work on the guidance, yes? [laughter]

KELLY VAN LANCKER: I mean, these are recent developments, right? So that takes time.

STUART POCOCK: Yes, yes, indeed.

COURTNEY SCHIFFMAN: But I would second updating the EMA (European Medicines Agency) guidance on covariate adjustment for sure. Anqi, anything to add, or should we move to the next question?

ANQI ZHAO: First, a huge disclaimer that I’m a pen and paper statistician proving asymptotics in my office. What I am about to say could be very off. So three, possibly very biased, knee-jerk reactions. The first is sort of a no-regret thing. Whenever we want to do covariate adjustment, make sure that they are indeed baseline covariates instead of post-treatment covariates, because post-treatment covariates indeed cause a lot of problems. And second, a more existential question regarding whether we want to do regression adjustment in the first place. I published some papers in this field, but in practice, maybe the answer is not so easy as yes, because all the theoretical results I proved are asymptotic. And it came up during a conversation during the break that in reality, everything is about finite samples, and we don’t know about the bias and everything. Asymptotically, more efficient, so what? Indeed, the decision-making in practice is a much more complicated problem than proving asymptotics in my office.

I think the third point is regarding whether to include interaction. From an asymptotic efficiency perspective, yes, definitely. But even compared with the canonical analysis of covariance (ANCOVA). First of all, canonical ANCOVA works in a majority of situations, and all the simulated results I showed in my papers are, again, highly cherry-picked examples to exaggerate its potential harm when certain conditions are violated. And second, back to the finite sample consideration. Interactive regression basically doubles the number of covariates. And that’s going to cause huge increase in finite sample variability as the trade-off between finite sample efficiency versus asymptotic efficiency.

KELLY VAN LANCKER: Regarding the interactions, I agree that your degrees of freedom are changing. But I have done simulations with and without interactions, and what I saw is that it’s not really hurting that much to include interactions, but it’s also not really giving you a lot of gain in many real data sets.

STUART POCOCK: My thought on interactions is it changes the game totally because you’re no longer getting a single estimate of a treatment effect that applies to the whole population, covariate-adjusted. Because you’re then changing the model—and usually with concerns about whether the interactions with treatment are really strong or not. Interactions between covariates are just getting the right model, I presume one means interactions with treatment. And those are into subgroup analyses with all the traumas that involves. One thing to comment, though, is you can have a model on a relative scale, such as a hazard ratio scale, which has no interactions. But then if you translate it to an absolute scale, then there are interactions. So interactions, the concept of them is model-specific. So beware, it makes the whole thing more complex. And a trial likes to have a single overall covariate-adjusted result unless the interactions are so strong that you must look beyond that.

KELLY VAN LANCKER: What we, of course, do when we are targeting this marginal estimand is bringing it back to one estimate. But of course, you then lose the information you had on the interactions. So not working conditionally has that advantage, but of course, it has a disadvantage that you lose the interpretation of the different subgroups. Although you can do another analysis for subgroups afterwards.

COURTNEY SCHIFFMAN: Okay. Thank you very much. My next question was going to be on the handling of stratified randomization. I thought I was going to come here and stir the pot by suggesting you should instead prioritize baseline covariates that are the most prognostic. I think I’m preaching to the choir if I were to say that. So if regulators come knocking, Stuart, I’m just going to send them to you to justify our use of adjusting simply for the covariates that are the most prognostic. Whether that’s an original form of a stratification factor, not the dichotomized one, or some other factor from maybe an external prognostic model. So that leads me into my next question. Dan, the one case I’ve seen is in antiviral trials where you might have some type of remote participation where the outcome is easy to obtain, but there can be baseline covariates that maybe necessitate some type of swab or measurement of viral load that can go missing for people. You mentioned sponsors who can develop a prognostic model on external, either historical clinical trial data or real-world data, and then use the predictors, the sum score from that prognostic model as a covariate in your clinical trial at hand. Were sponsors to do this, what do we have to show? What do we have to provide to regulators in terms of the data-handling strategy, how the model was developed, etc., to derive this score that is adjusted for in the upcoming clinical trial?

DANIEL RUBIN: Thank you for the really great question. I do think that forming a prognostic index using external data and then using that as a covariate and an adjustment model in a prospective trial is generally a safe method. Because of all this robustness we’ve been talking about with covariate adjustment, it can be thought of as another type of baseline covariate. And because we know that adjusted analysis is generally a reliable method, this should be a reliable method as well. I think in terms of what could be some considerations for this type of method, if you use some type of, say, complicated machine learning method with external data to create your prognostic score, if it’s not as correlated with the outcome in your prospective trial as it has been in external data, then you might not get the precision gains that you had been hoping for. But nevertheless, that doesn’t invalidate the method. I think that that could be an issue.

And then another issue really is reproducibility. I don’t think that the Food and Drug Administration (FDA), at least, would be looking for all the details of did you optimally create this prognostic score using external data? But once you have it, is it some kind of fixed covariate that we can define with some type of algorithmic approach so that it really is a pre-specified covariate. Hopefully, I didn’t butcher that explanation too badly. But overall, I would say that this is generally something that’s covered by our guidance and would be considered a valid method to the FDA.

STUART POCOCK: I would just say that having determined the risk factors, the variables that are in the score, I would not then use those same weights that you derived from the past data. I would allow the variables to find their own relative importances as their own coefficients in a fresh modeling because it may be that some variables in a score really matter in this next trial and others don’t. I would allow free rein on the coefficients of the variable.

COURTNEY SCHIFFMAN: Thank you very much for the point. What if your model budget does not allow for you to adjust for all those covariates individually?

STUART POCOCK: You mean you didn’t collect them all?

COURTNEY SCHIFFMAN: No. What if your sample size doesn’t allow for you to adjust for that many covariates individually?

STUART POCOCK: Oh, that’s a good point. No. So I said I’m into big trials, so I don’t meet those problems usually. But if you do happen to have a smaller trial, then—that’s a good point—you may then need to go for the model you derive from the past trial. That’s right. That’s a good point. Yes.

COURTNEY SCHIFFMAN: May I ask one more follow-up question? What if you have a larger Phase III trial but with a binary endpoint, and the number of events is expected to be small in both treatment arms or in a given treatment arm, say, the sham arm? Does that affect how many covariates you can adjust for? Does it really limit the number?

STUART POCOCK: Well, I think the simple answer is yes, and the US FDA guidance said it should. Numerically, to say how many you’re allowed, I don’t think we get to that numerical subtlety, do we? But yes, too many covariates and a small trial would be getting out of hand, and model instability might ensue, I’d have thought.

COURTNEY SCHIFFMAN: Any guidance, Kelly or Anqi, on how many covariates you can adjust for in such a setting?

KELLY VAN LANCKER: Not too many. Your standard errors are often performing badly (i.e., biased), when you are adjusting for too many covariates compared with the number of events. So at the moment—and that comes back to your first question—there is no real guidance on how many covariates we can adjust for. It also depends on the setting. I usually limit it to—if I have a sample size of hundreds—not more than five covariates, for example. Or do selection. And to come back to the prognostic score, I guess, yes, that was a really good question regarding what if you have too many covariates in your prognostic score that you cannot adjust for individually. I think what’s also important if we are adjusting for this prognostic score is maybe try to find still the five most important ones. Also add them to our model so that if they were modeled in a wrong way, if the association was different in the previous trial compared with this one, then you can have a benefit from the individual covariates. If the prognostic score was good, then you can have benefit from both.

ANQI ZHAO: Regarding the propensity score, do you mean including propensity score as an additional covariate?

KELLY VAN LANCKER: I actually meant prognostic score or prognostic index, which is just fitting your outcome model on a previous data set.

ANQI ZHAO: I see. Sorry, I misheard.

COURTNEY SCHIFFMAN: All right. Thank you very much. I’ll wrap it up for our audience questions.

DEVAN MEHROTRA: Okay. Thank you very much. We have plenty of time for robust discussion, and wonderful to see some hands raised already.

YIMEI LI: Yimei Li, from the University of Pennsylvania, I have a couple questions. The first question is as follows: when I did a trial a long time ago, and adjusted for covariates, I was criticized by causal purists. The criticism was that I shouldn’t be adjusting for covariates because one of the arguments is that the clinical trial is to obtain causal inference, assuming everyone’s treated versus no one’s treated. Especially in non-continuous outcome, that can be a problem. If you adjust for covariates, that means your inference is conditional. For example, conditional on how old you are, what sex you are. So that’s my first question. The second question is that I see in Dr. Pocock’s talk that adjusting covariates didn’t seem to make a difference? But Dr. Zhao says that adjusting for covariates makes a lot of difference in terms of efficiency. My question would be when does it make a difference, when doesn’t it make a difference?

STUART POCOCK: Yes. I think it always makes some—no, always is a big word. Never say always, never say never. It virtually always makes some difference. It’s a matter of how much difference. And I think that depends. To my mind, the strength of the prognostic properties of the covariance you’re adjusting for drives that. I think it’s always worth doing. I think it’s a free ride. I mean, you’ve got the data, assuming you’ve collected the covariates. You’ve got the data, and therefore, you should do it. I did rehearse my talk, even though it might not have shown. And a colleague at the end said, “So why don’t the people doing the really big trials bother with covariate adjustment? Is it kind of an intellectual arrogance that they don’t bother to do it?” And I’m picking up on a point of Dr. Rubin, which is I think it does matter in both big trials and small trials to adjust. It’s not related to one size or another.

And therefore, I think I’d like to persuade more people doing the 20,000-patient trials that it really is worth adjusting for covariates. But they usually say, “Look, I’m so big, it doesn’t matter.” And they’re not quite right. But which one is going to make a difference? That’s the lottery. Because it depends on—it’s not only the strength of the covariates. It’s how they are imbalanced between treatment groups, and you can’t possibly know that in advance, of course. And so you never know, in your next trial, whether the covariate adjustment will make a difference or not, and that’s the mystery of it all, really.

ANQI ZHAO: Yes. It’s indeed paradoxical in a sense that if all covariates are perfectly balanced, then there should be no gain because the covariates are

STUART POCOCK: Oh, no, no, no, no.

ANQI ZHAO: orthogonal to the

STUART POCOCK: No, there’ll be a gain. There’ll be a gain. If you’re perfectly matched, you will improve—and it’s logistic or proportional hazards. You will pull the estimate away from the null by covariate adjustment, even when it’s perfectly matched.

ANQI ZHAO: I see.

DEVAN MEHROTRA: Kelly, do you want to add anything?

KELLY VAN LANCKER: Not really. I guess everything is said. Maybe one covariate that I always include is the baseline measurement of my outcome because that often has so much information compared with other covariates. If you are wanting to make a selection in advance, that one you should not forget.Of course, for binary endpoints, it’s always a bit more tricky, but with the continuous endpoints, it can tell you a lot about your primary endpoints.

DEVAN MEHROTRA: Dan, did you want to comment on the question? I know, in your presentation, you made the point that, oftentimes, covariate adjustment may not hurt you, but it could help you. Did you want to build on that or any other comment?

DANIEL RUBIN: Right. I think Professor Pocock said it very well. It might be sort of a free lunch in some sense. But it was humbling to see his talk and survey of the trials in which covariate adjustment did change the side of statistical significance one was on. I guess maybe I would just add that might not be the only kind of metric to look at to see if covariate adjustment is improving precision, and it might differ in other therapeutic areas or in future trials. But it was interesting and humbling to see those results.

DEVAN MEHROTRA: Just one other point on that. Stuart, you presented a very nice example, where in one file, the covariate adjustment did appear to help. You got an improvement in the Z-statistic. And the companion trial that came later actually went in the other direction. So does that serve as a reminder for us that sometimes a covariate adjustment could actually fix an imbalance that was artificially favoring the treatment arm, and this now brings you back on track?

STUART POCOCK: Yes, that’s right. You need to come clean when you’ve been lucky, might be a way to express it, and say, “I confess we had an imbalance favoring my treatment, and now I’m going to correct for it. And at the same time—but I didn’t lose anything by correcting for it. I just feel better about it.” It should be the philosophy perhaps.

DEVAN MEHROTRA: We have plenty more questions.

FRANK HARRELL: Frank Harrell, Vanderbilt University, Biostatistics. This is a comment on Stuart’s initial question about, can you do blinded covariate selection for ANCOVA? And I think the literature is pretty conclusive that you don’t really mess up your regression coefficients of treatment that way, but all the standard errors are basically ruined. So all the standard errors will be underestimated by that procedure. Standard errors of the covariates themselves and the standard error of treatment will be falsely low if you do variable selection on the same data set, even if it’s blinded.³

DEVAN MEHROTRA Thank you. I think there might be some disagreement, but please go ahead.

STUART POCOCK No, Frank, I think you’re right. Can you fix that in some way?

FRANK HARRELL: It’s better to use penalized regression.

DAVID WRIGHT: David Wright from AstraZeneca, Cambridge, UK. Thank you. To Courtney’s point about the EMA revising the guidance, just to reassure you, I was talking to them recently, and they are revising the guidance. That should come later this year, I understand. But I’d like to ask a question regarding reproducibility of the prognostic score. So historically, if we take something like the QRISK score—I think it’s the QRISK3 score now. It changes over time. Imagine one sponsor has a score, and that’s in their label. At the same time, there’s another sponsor, and they get some data from somewhere else, so obviously, they’ll have a different score. But that’s sort of not quite what normally happens in medicine. In medicine, normally, it will be the medical community saying what the prognostic score is before we then put it in a trial. So does the panel have any comment on the way to navigate that complex discussion?

STUART POCOCK: My immediate comment is, by definition, all prognostic scores derived from past data are out of date by the time you’re thinking of using them on your next trial. And also, your next trial will have different eligibility criteria, no doubt, from whatever the previous trials were. So there’s always a limit of generalizability. But it won’t do any harm to use the best score going in the past, probably. But don’t get too full of yourself in thinking it’s the answer. It’s helpful, but not the answer.

DAVID WRIGHT: Okay, I’ll try a different tack. In the label at the end, how can we explain what we’ve done? And so again, you’ve got the two sponsors. They’ve obviously got two different models, and that’s got to be communicated in the label to explain what’s happened. Is that an issue at all?

STUART POCOCK: You mean with the new trial producing a covariate-adjusted estimate by some means?

DAVID WRIGHT: Yes, and then a different sponsor will have a different covariate adjustment method. So historically, it’d be nice and straightforward—even though I agree with you that you shouldn’t stratify—that you’d say, “Well, I stratify for these three things, and that’s what’s in the model,” and everyone does the same. But that’s not going to be the case.

STUART POCOCK: I would think a covariate adjustment is heading in the direction of matching in a way. They’re different, but they’re slightly analogous. You come up with a result for a trial, and you’re saying, “Having taken into account the profile of a patient, on average, I am declaring that the treatment effect is such and such.” And if somebody did a different technique of adjustment, they could still, in that soft way I just did, use the same language, perhaps. And it’s a bit like saying, “Stuff the details,” isn’t it?

DAVID WRIGHT: A little bit, but you were more helpful than just that, I think. Thank you.

DEVAN MEHROTRA: Any other panelists wish to comment? If I understood that question, just for everyone’s benefit, you might have one sponsor who’s adjusted for covariate X1. They have a label. You have a competitor who’s come up with a similar drug. They’ve adjusted, in their pivotal trial, for covariate X2. They have a different label. And the question is, does it matter? Because in the end, I would think if you have the same estimand, which doesn’t matter whether you adjust for X1 or X2—if you have the same estimand, it shouldn’t matter. But let the experts really say what their thoughts are.

KELLY VAN LANCKER: I mean, yes, if you’re focusing on a marginal or an unconditional estimate, that’s true, but while I have no experience with labeling, think I agree that we should then still say in the label somewhere what we adjusted for.

DEVAN MEHROTRA: Dan, would you like to comment on this?

DANIEL RUBIN: Sure. I think that it’s an issue that can be navigated in labeling, just by really describing what—if an adjusted result is presented, what covariates are included in that model. And if it differs between trials, I just think that the difference in covariates in those settings could be described. It might become more complicated if there are, say, subgroup results in labeling if the prognostic score for one label means something else than the prognostic score for a different label. You would want that to be clear if you’re dealing with a score that was being updated over time. But I think it’s an issue that probably could be word-smithed in labeling.

DEVAN MEHROTRA: Any other comments from the panel? If not, we go on to the next question.

UNIDENTIFIED AUDIENCE MEMBER: I think, just in follow-up to that, you’re going for the same treatment effects, so you should be, theoretically, observing similar treatment effects in those two scenarios, I think. So that should be described or mentioned.

DEVAN MEHROTRA: Thank you. Let’s go to a question online.

PAMELA SHAW: Yes. This is coming from John Tamaresis from Stanford University. And he asked the question, “Can covariate adjustment be used in intent-to-treat analyses?” And I’ll add my interpretation here, is the minute you add covariates, you’re adding missing data to your, otherwise, complete data problem. And the more covariates you have, the more missingness. Maybe thinking about that in that context. So how do you think about this and the intent-to-treat analyses?

STUART POCOCK: Well, I assumed one was doing intention to treat analyses throughout. All the examples I had any control over were intention to treat analyses.

PAMELA SHAW: So the missing data element then—if you have missing data in covariates, whether it’s in a 0.5% of your cohort or 5% of your cohort that are missing baseline covariates, how are you adjusting that? How are you adjusting your intent-to-treat analysis for that?

STUART POCOCK: Well, I took intention to treat to do with what happens after randomization and follow-up. If we now retreat back to the other point, which is missingness of covariates, I was thinking of asking the same question. At what level of missingness of a covariate does it become untenable to still keep it in the analysis, or should you be dropping it? And that’s a tricky one because it depends on how strong a predictor it is in those where you have it. So I once had a study where the ejection fraction was an incredibly powerful predictor of—I think it was a survival outcome, but it was missing in a lot of patients in the cohort we had. And therefore, we struggled to keep it in because we think it was so important. Feeling guilty at the same time by—remember, multiple imputation means making up data. Never forget that. And therefore, I think there’s an issue of, are you stretching it too far? And I don’t know the answer. I think it’s a very good point.

ANQI ZHAO: This is related to the work that I mentioned that I didn’t have time to talk about today regarding the missingness in covariates. Again, all our results are asymptotic. As the sample size goes to infinity, then we proposed a practically straightforward method called the missingness indicator method. We didn’t invent it. It has been in the literature for a long time, but we just rediscovered it and showed that it’s convenient to use and it ensures efficiency gain under relatively reasonable assumption. But again, all our results are asymptotic. The method is like this. We include a column of missingness indicator for the missing covariates. If it’s missing, then it’s one. And then if it’s not missing, then it’s zero. And then we just impute the missing dimensions of the original unit by whatever your lucky number. And then we can show that if we add both covariates into the regression specification, if we run the interacted regression, then asymptotically the resulting regression estimator would ensure asymptotic efficiency. So it’s by no means the best estimator, but it’s an estimator that ensures efficiency gain over the unadjusted difference. It means estimator asymptotically.

DEVAN MEHROTRA Is there any assumption about the missingness mechanism for that?

ANQI ZHAO: Yes, definitely, because that’s the core assumption, just the same missing at random assumption that I made here. So it assumes a kind of independence between the covariates missing this mechanism, and the treatment status.

DEVAN MEHROTRA: Dan, did you have any comments on in a pivotal trial, when FDA looks at it? Any comments on if you have missing data in covariates, is there any guidance that you can provide for what the sponsors should be doing?

DANIEL RUBIN: Sure. My understanding is that covariate adjustment tends to be fairly robust to missing pre-randomization baseline covariates. Either using imputation or having missingness indicators generally doesn’t invalidate the method. Now, as soon as you have missing outcome data and missing post-randomization data, then there are many more assumptions that can come into play. That said, there aren’t a lot of trials I’ve experienced where missing baseline data has been kind of a larger problem than missing outcome data. Usually, it’s the reverse. The one case I’ve seen is in antiviral trials where you might have some type of remote participation where the outcome is easy to obtain, but there can be baseline covariate that maybe necessitate some type of swab or measurement of viral load that can go missing for people. But overall, I would say that if you can measure your outcome, that covariate adjustment tends to do okay, even if you have missing baseline covariates.

DEVAN MEHROTRA: And I will point out that about 20 years ago, we had a vaccine clinical trial. We were looking at immune responses, what happened relative to the baseline, and because of some complicating circumstances, about 20% of the samples were damaged. As 20% of the subjects were missing their baseline, this had motivated the Merck statisticians to go toward what’s called the constrained longitudinal data analysis (CLDA), which accommodates all the data post-baseline, even if you’re missing a baseline. It’s a very neat way developed by the folks at Hopkins. And of course, that’s for longitudinal data analysis (LDA) in a specific context where you’re missing just the baseline. Let’s go to more questions online.

PAMELA SHAW: We think this one was answered, but I’ll ask it on the behalf of a PhD student from Manchester. It was mentioned earlier, about the problems with collapsibility that can arise when working with nonlinear models. Their question related to how collapsibility relates to the choice of estimand or should it even guide your choice of estimand? And I thought I would open that up to the panel. My response was going to be—and Kelly, please correct me—that if you’re choosing a marginal estimand, then you’re not worrying about non-collapsibility you’re maybe thinking about, “This is the estimand for this population.” Whereas if you’re choosing a conditional nonlinear model, like a hazard ratio or the odds ratio, then you do have to worry about collapsibility. So in some sense, it is a property of the estimand in nonlinear models.

KELLY VAN LANCKER: Yes, so I guess you have the problem of non-collapsibility, but of course, it’s not that marginal estimand is solving everything in the sense that we are assuming a superpopulation. I guess Frank Harrell would say that’s not realistic, and I understand where that is coming from. It’s not that a marginal estimate is the cleanest you can do. It gives you an idea about your whole population and your trial, but whether it’s ideal, not necessarily. But once you condition, there are, of course, other problems also like regarding model misspecification and the non-collapsibility. So I think why many of us that are present here today are focusing on marginal estimands, is because we come from the causal inference world (and that’s what we often focus on there, although there is now much more unconditional estimands than before). On the contrary, I think it’s still the case that in a clinical trial—we have mentioned that a few times already—we want one number, and the marginal estimand is giving us that. So that’s, I guess, why many of us focus on the marginal estimand, but I would never only focus on the marginal estimand in a trial. I think the conditional one is also important, but you indeed have, with so many measures, the non-collapsibility problem.

DEVAN MEHROTRA: Any other questions in the audience? Yes, please. And then after that, Frank Harrell has a follow-up question.

CHRIS JENNINGS: Chris Jennings, MD Anderson. This problem has been alluded to in a few cases, but I would like to know if you have any more insight or guidance. There are times when baseline covariates come with a cost, oftentimes, a burdensome cost in terms of monetary value or patient burden or administrative overhead, and this is something we have to deal with as far as our design is concerned. So before we even think about including those variables, we have to collect those variables and the design and the associated costs. Do you have any sort of guidance or insights beyond technical relative efficiency calculations? Do you have any other insights that might come to bear on this calculation, this consideration with respect to the cost of collecting these baseline covariates?

STUART POCOCK: You raise an interesting point that’s often not discussed. I think when you’re doing large, simple trials, pragmatic ones, you often minimize the amount of data collection to keep costs down, not only monetary costs, but effort on the part of the investigators, cost of time. In those circumstances, you may not have the key covariates you’d really like to adjust for. So therefore, in some circumstances, I think that’s acceptable in the cause of the need for the large, simple trials. Other trials, they don’t usually fuss too much about the costs of the covariates. I haven’t heard it discussed usually. I suppose there are some biomarkers that are new and may be expensive to measure, still, for instance. That could be an issue. Or doing a baseline exercise test on a patient will be very time-consuming. So you raise an interesting issue for which I don’t really have an answer.

DEVAN MEHROTRA: All right. We have time for one more question. Frank Harrell?

FRANK HARRELL: This is a comment about the missing indicator variable approach.³ If you’re using the missing indicator approach and trying to interpret the coefficients in the model—in other words, you’re not just using it as a plug-in to your later steps that you described—there’s been some serious problems reported with that approach. Even under missing completely at random, the variable that’s missing has to be uncorrelated with all the other covariates because if you use a missing indicator approach, you’re putting in a constant for the missing values. You’re creating a correlation of zero between that variable and the other variables, and that destroys your covariance matrix, which will bias the betas. In interpreting the model, not going to your full strategy, but just interpreting that one model, there are some pretty big biases in the beta coming from missing completely at random when the variable is correlated with other non-missing values.

STUART POCOCK: My specialist in my department back in London in missing data, James Carpenter, said the same as you, except much shorter. He said, “Don’t do it.” And Frank, why wouldn’t one do multiple imputation in that context if it really matters?

FRANK HARRELL: Yes.

ANQI ZHAO: Can I ask one clarification question? So regarding the beta, are you referring to the entire coefficient vector of covariates?

FRANK HARRELL: Perhaps it’s best to concentrate on whether the treatment effect is distorted. But any variable that’s correlated with a variable that’s missing will have its beta distorted.

ANQI ZHAO: So based on our theoretical results, maybe we somehow didn’t face the challenge in the asymptotic analysis that we did. Off the top of my head, I don’t understand if the magic is due to the interactive regression or because of the design-based inference perspective that we took. Because for our theoretical framework, we assume that the randomization, the treatment indicator is the sole source of randomness and conditioning on everything else. And I was wondering if that could be the magic, but this is just my wild guess. And I need to rethink about it. Sorry that I don’t have a clear answer off the top of my head, but I do want to say that if I recall it correctly, we didn’t face the tricky thing that you just mentioned. But thank you so very much for bringing it up.

FRANK HARRELL: Just two quick comments. First of all, I’m never interested in asymptotics. Second, you’re using the missing indicator for a later step in which the particular problem I mentioned may get canceled out.

ANQI ZHAO: I see.

FRANK HARRELL: But if you’re using the model in its initial step, there’s some serious problems with bias in the betas with the missing indicator method.

ANQI ZHAO: I see. Thank you so much.

DEVAN MEHROTRA: Great. We have time for just one more question. Eric Tchetgen Tchetgen, I think you had a question.

ERIC TCHETGEN TCHETGEN: Thank you. I was going to stir controversy with Frank, but I’m going to stay away from that one in terms of asymptotics. But in the spirit of stirring some controversy, because we were charged to stirring some controversy, could you say something about—there is one aspect of missing data that hasn’t really been discussed much, which is dropout, censoring, which occurs whether or not the outcome of interest is time to event. As soon as you have follow-up, people drop out of your trial. And one of the first examples of using an adjustment of covariates was one where, actually, you use post-baseline covariates to account for dropout in inverse probability of censoring weights, and it was demonstrated that that can actually increase your efficiency by a lot. And I wanted to push back a little bit about a statement that was said earlier about never use post-baseline covariates. I think the right statement is you may use them for certain problems and use them carefully in the principal way. And you can get a big payoff from it. And so dropout seems to be a much more fruitful area to consider covariates. And in fact, prognostic factors that are post-baseline that are closer to the outcome are a really, really good set of covariates to use. So anyway, I just wanted to put that out there.

ANQI ZHAO: Thank you so very much for pointing that out, and echoes Stuart’s early comment, and never say always and never say never. So I retract what I said, never use post-treatment covariates. I haven’t had a related experience in that part of the literature, and therefore, I could be very biased to comment. And thank you so very much for pointing out.

DEVAN MEHROTRA: Well, it’s time for lunch. Thank you all so much for the morning session.

Footnotes

ORCID iD

Mary E Putt

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Seaman

Vansteelandt

. Introduction to double robust methods for incomplete data. Stat Sci 2018; 33(2): 184–197.

Lancker

Díaz

Vansteelandt

. Automated, efficient and model-free inference for randomized clinical trials via data-driven covariate adjustment. arXiv, http://arxiv.org/abs/2404.11150 (2024, accessed 1 January 2026)

Donders

van der Heijden

Stijnen

, et al. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 2006; 59(10): 1087–1091.