Abstract
A cancer diagnosis is part of a complex stochastic process, which involves patient's characteristics, diagnosing methods, an initial assessment of cancer progression, treatments and a certain outcome of interest. To evaluate the performance of diagnoses, one needs not only a consistent estimation of the causal effect under a specified regime of diagnoses and treatments but also reliable confidence interval, P-value and hypothesis testing of the causal effect. In this article, we identify causal effects under various regimes of diagnoses and treatments by the point effects of diagnoses and treatments and thus are able to estimate and test these causal effects by estimating and testing point effects in the familiar framework of single-point causal inference. Specifically, using data from a Swedish prognosis study of stomach cancer, we estimate and test the causal effects on cancer survival under various regimes of diagnosing and treating hospitals including the optimal regime. We also estimate and test the modification of the causal effect by age. With its simple setting, one can readily extend the example to a large variety of settings in the area of cancer diagnosis: different personal characteristics such as family history, different diagnosing procedures such as multistage screening, and different cancer outcomes such as cancer progression.
1 Introduction
A cancer diagnosis is part of a complex stochastic process, in which patients’ personal and social characteristics influence the choice of diagnosing methods, diagnosing methods in turn influence the initial assessment of cancer stage, cancer stage, in turn, influences the choice of treating methods and treating methods in turn influence cancer outcomes such as cancer survival. To evaluate the performance of cancer diagnoses, one needs to estimate and test the causal effect under a specified regime of diagnoses and treatments, shorthanded as sequential causal effect (SCE). An important example of the SCE is the one under the optimal regime.
In the causal inference of treatment sequence, Robins derived the well-known G-formula, which identifies the SCE under a regime of treatments via standard parameters.1,2 Based on Robins’ G-formula, a parametric likelihood-based approach has been developed for estimating the SCE via standard parameters.3,4 If the time-dependent covariates between treatments are posttreatment variables from the previous treatments as well as confounders for the subsequent treatments, however, this approach may suffer from the curse of dimensionality and the null paradox.3,4 To avoid these problems, semi-parametric non-likelihood-based approaches are developed, for instance, the marginal structural model – including the doubly robust estimation – based on the inverse probability of treatment weighting4–6 and the G-estimation based on the structural nested mean model (SNMM) or optimal-regime SNMM.2,4,7 In a recent review on the optimal regime, Kosorok and Laber 8 highlighted the need for reliable methods of evaluating the uncertainty in estimating SCEs (Section 6, statistical inference).
Recently, Wang and Yin9,10 derived a new version of the G-formula, which identifies the SCE by the point effects of individual treatments in the sequence. The point effect of treatment is simply the one in the causal inference of single-point treatment. Its estimation is well studied in the framework of single-point causal inference, for instance, it can be likelihood-based and further doubly robust to possible misspecification of the outcome model or the treatment assignment model. 11 The estimation does not suffer from the curse of dimensionality and the null paradox. Based on the new G-formula, they proposed a parametric likelihood-based approach, in which they were able to estimate and test the SCE without suffering from the curse of dimensionality and the null paradox by estimating and testing the point effects. They achieved not only an unbiased estimate of the SCE but also the nominal level of coverage probability for the confidence interval.
In this article, we study the application of the parametric likelihood-based approach based on the new G-formula to the area of an early cancer diagnosis. We estimate and test SCEs under various regimes, including the optimal one, of diagnoses and treatments as well as the modifications of SCEs by covariates. We will illustrate our method by an interesting example using data from a Swedish prognosis study of stomach cancer and provide practical advice.
2 Data and the assumption for estimating and testing SCEs
In Sweden, patients usually seek medical help at hospitals near their residential areas (namely, catchment areas). When cancer is diagnosed, they may stay at the diagnosing hospital or transfer to another hospital for treatment. The hospital diagnosing cancer is called the diagnosing hospital, while the one treating cancer is called the treating hospital. To evaluate the performance of diagnosing and treating hospitals, we may study cancer outcomes under various regimes of diagnosing and treating hospitals among cancer patients after adjusting for patients’ differences. A question of relevance to public health policy is which type of the diagnosing and treating hospitals, large versus small, performs better on cancer survival.
The data used in this study is from a prognosis study conducted during the period between 1988 and 1995 in hospitals in central and northern Sweden.
12
It contained information on 910 patients with stomach cancer. The large type refers to the regional or county hospitals and the small type to local hospitals. The diagnosing hospital is the treatment variable
The following stationary covariates before
Frequencies or means (standard deviations) of covariates and outcome across the diagnosing and treating hospitals for 910 stomach cancer patients.
•Hospital types:
•Stationary covariates:
•Time-dependent covariate:
•Outcome:
Due to a long-term social welfare system and relatively uniform culture in Sweden, we assume that gender
Because the cancer stage has a significant influence on diagnosing hospital as well as on treating hospital, one cannot use the standard methods such as the usual regression to estimate and test the SCE.1,2 In the following, we will apply the new G-formula to estimate and test the SCE via the point effects of diagnosing and treating hospitals.
3 Estimating and testing SCE under the regime of diagnosing and treating hospitals on one-year survival
Wang and Yin 9 constructed point parametrization for the likelihood of all the observable variables arising from a sequence of treatments, with the point effects of treatments in the sequence as a subset of the parametrization. They showed that with this parametrization, the score functions for the point effects of treatments are little associated with one another at different treatment times, so that the point effects can be estimated separately at different treatment times, like the point effect of one single-point treatment. Furthermore, Wang and Yin 10 derived a new version of the G-formula, which expresses the point effects in terms of the blip effects of treatments and then all other SCEs in terms of the blip effects. As a result, a likelihood approach is developed to estimating and testing the blip effect and the SCE via these point effects.9,10 Here, we introduce a concrete procedure of applying their method to an observational study, where the treatment assignment mechanism is unknown and the covariates can be continuous.
3.1 Point effects of diagnosing and treating hospitals
Let
To estimate the point effect of diagnosing hospital
Point effects, blip effects and optimal SCEs of diagnosing and treating hospitals on one-year survival of stomach cancer: estimate, P-value and 95% CI.
• Five point effects:
• Five blip effects:
• Five optimal SCEs:
• Modification of the blip effect by age:
To estimate the point effect of treating hospital
Briefly, we estimate the point effects of diagnosing and treating hospitals by modelling the means
3.2 Blip effects of diagnosing and treating hospitals
Because neither (1) nor (2) contains the geographic area
An interesting clinical observation in cancer diagnosis is that young patients diagnosed at small hospitals tend to have poor prognoses.
13
This phenomenon is known as doctors’ delay, but little-studied statistically. Motivated by this clinical observation, we suppose that
Now by applying the new G-formula
10
(see also formula (10) for
Because
Because age
As an illustration, let
Let
Now, conditional on all covariates, diagnosing and treating hospitals, we use (5a) and (5b) as a regression model to estimate
Briefly, we estimate and test the blip effects of diagnosing and treating hospitals in the usual framework of regression. The result is presented in Table 2.
3.3 SCE under any regime of diagnosing and treating hospitals
First, we consider the
In (6) replacing
By the dynamic programming procedure, we estimate the optimal regime
From the estimate
Briefly, we use one estimated SNMM to estimate SCEs under various regimes and thus can compare these regimes by the hypothesis testing of the SCEs under these regimes.
3.4 Causal analysis of diagnosing and treating hospitals based on Table 2
First, we analyse treating hospital
For the cancer stage
For the cancer stage
Second, we analyse the diagnosing hospital. The blip effect
Third, we analyse the optimal regime in the population. Taking the average of
Here, all effects are measured by the difference in mean survival, but they can also be measured by the difference in a function of mean survival. We may apply the new G-formula to estimate the causal effect measured as odds ratio, rate ratio and hazard ratio. Though the estimates are consistent, the problem of non-collapsibility of a non-linear measure becomes far worse in the context of sequential treatments, leading to biased estimates of these measures and false coverage probabilities of the confidence intervals for finite samples. Therefore, it is recommended that one uses the linear measure for the SCE.
In the medical example of this article, there was no censoring. Suppose there were non-informative censorings such as the loss of follow-up. Based on the new G-formula, we may address the loss of follow-up in the framework of single-point causal inference.9,10 When estimating and testing the point effects and blip effects in Sections 3.1 and 3.2, we may simply remove the censored patients in the estimation. When estimating SCEs under various regimes in Section 3.3, we may simply remove the censored patients if the loss of follow-up occurred between
4 Estimating and testing SCEs of cancer diagnosis and treatment on three-year survival
In Section 3, we replace one-year survival with three-year survival and follow the same procedure to estimate and test the blip effect and the SCE. The result is presented in Table 3.
Point effects, blip effects and optimal SCEs of diagnosing and treating hospitals on three-year survival of stomach cancer: estimate, P-value and 95% CI.
• Five point effects:
• Five blip effects:
• Five optimal SCEs:
Only for the cancer stage
5 Comparison of our method with available methods
Here, we only consider one-year survival. Method (i) is our method described in Section 3. Method (ii) is the parametric method based on Robins’ G-formula.2–4 Method (iii) is the marginal structural model based on the inverse probability of treatment weighting.4–6 Method (iv) is the G-estimation based on SNMM or optimal-regime SNMM.2,4,7,8; this method incorporates both the Q-learning and A-learning in estimating optimal dynamic treatment regimes.
With our data, we aim to examine the modelling assumptions behind these methods and their abilities of estimating SCEs and comparing the underlying regimes of diagnosing and treating hospitals. These methods are active areas in the literature,4,8 but to focus on the problems, we do not use advanced versions of these methods, which alleviate the problems. Methods (ii)–(iv) are also described in the context of this article in Supplemental Data. See also a simulation study for a comparison between these methods. 10
Because methods (i)–(iv) may lead to the same inference of the causal effect of treating a hospital, we focus on the causal effect of diagnosing a hospital. In Section 3, we have applied method (i) to estimate the blip effect
Comparison of our method with available methods in Section 5: estimate, P-value and 95% confidence interval (95% CI) for causal effects of diagnosing and treating hospitals on one-year cancer survival.
• Four estimation methods: method (i) our method; method (ii) the parametric method based on Robins’ G-formula; (iii) the marginal structural model based on inverse probability of treatment weighting; method (iv) the G-estimation based on SNMM or optimal-regime SNMM; Empty cells imply that they are not easily estimable by the method.
• Causal effects:
With method (i), the modelling assumptions for the outcome are models (1) and (2) for estimating the point effects of
With method (ii), we assume an unsaturated outcome model for the standard parameters
With method (iii), we assume models for the probabilities of assigning diagnosing and treating hospitals and use the estimated probabilities to calculate stabilized weight and non-stabilized weight for the outcome. With the stabilized weighted outcome, we estimate the mean of the potential outcome under a static regime and thus
With method (iv), to estimate
6 Conclusion
In recent years, a huge amount of clinical data has become available, for instance, from various Swedish quality registers, which contain almost all economic and social information of a patient as well as a nearly complete record of clinical visits for the diagnoses and treatments of various cancers. Such data should lead to a large variety of comprehensive longitudinal studies of the influences of early diagnosis on various cancer outcomes such as survival and progression. In these studies, one needs not only to estimate SCEs under various regimes of diagnoses and treatments but also to compare these regimes.
In this article, we study the application of a parametric likelihood-based approach, 10 which allows for not only an unbiased estimation of SCEs under various regimes but also a comparison between these regimes by testing these SCEs under the same modelling assumption. Our method is implemented in three steps: first to estimate and test the point effect, second to use the estimated point effect to estimate and test the blip effect, and finally use the estimated blip effects to estimate and test SCEs. Each of these steps can be carried out using the usual regression and can be examined using the usual modelling tools, familiar to epidemiologists, in the causal inference of single-point treatment. Please note that the SCE is such a parameter that involves the entire data-generating mechanism of all covariates, treatments and the outcome across different diagnosing and treating stages, and consequently, it is highly difficult for a single procedure or algorithm to achieve both reliable estimation and hypothesis testing of the SCE.
Our medical example contains most of the essential components for evaluating the performance of cancer diagnosis: the blip effect, its modification by covariates and SCE under a general regime. It is also an observational study with continuous covariates. With its simple setting, we can readily extend this example to various cancer types; to different cancer outcomes such as cancer progression, quality of life and others; to different diagnosing techniques such as the biomarkers or a sequence of screening steps; to different modification factors such as social-economic status, comorbidity, and family history. With complex covariate settings in cancer diagnoses, we believe that we may also apply more advanced methods than the usual regression, such as the targeted maximum likelihood method and even machine learning, to estimate the point effects. 11 If it is difficult to specify the distribution of the observable variables, we believe that we may also employ semi-parametric or non-parametric methods to estimate the point effects.
Supplemental Material
sj-docx-1-smm-10.1177_09622802221098429 - Supplemental material for Estimating and testing the influence of early diagnosis on cancer survival via point effects of diagnoses and treatments
Supplemental material, sj-docx-1-smm-10.1177_09622802221098429 for Estimating and testing the influence of early diagnosis on cancer survival via point effects of diagnoses and treatments by Xiaoqin Wang, Johannes Blom, Weimin Ye and Li Yin in Statistical Methods in Medical Research
Footnotes
Acknowledgements
Xiaoqin Wang and Li Yin were partially supported by Swedish Research Council with the grant number 2019 – 02913. All authors are grateful to the anonymous reviewer and editor for their comments and suggestions, which have considerably improved the article.
Ethics approval
The proposed research is covered by the ethical committee approval (DNR880113/13, §121) from the ethical review board of Uppsala University.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author contribution
The four authors make equal contributions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Xiaoqin Wang and Li Yin were partially supported by the Swedish Research Council with the grant number 2019 – 02913.
Supplemental material
(a) A general description of the new G-formula for the SCE in terms of point effects of treatments and a description of available methods (ii), (iii) and (iv) in the context of the medical example. (b) Data and code for all analyses of this article are available in [Zenodo], at
.
14
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
