Sage Journals: Discover world-class research

Abstract

Background

Immune checkpoint inhibitors (ICIs) provide a significant survival benefit in non-small cell lung cancer (NSCLC) patients; however, accurately predicting which patients will benefit remains a challenge. As previously shown, the STOP model, a machine learning model based on serum tumor markers, is capable of identifying non-responders after 6 weeks of ICIs.

Objective

This study aims to externally validate this model and to assess the predictive value in combination with radiological response assessment using RECIST criteria.

Methods

In a cohort of 242 metastatic NSCLC patients, CYFRA, CEA, and NSE were measured before start and after 6 weeks of ICI treatment. The ability of the STOP model to predict no durable benefit (NDB; progressive disease, death within 6 months or disease control of less than 6 months) was assessed using specificity and positive predictive value (PPV). Moreover, a combination of the STOP model with RECIST after 6–8 weeks of ICIs was investigated.

Results

The STOP model achieved a specificity of 96% (95% CI 95%–97%) and a PPV of predicting NDB of 88.1% (95% CI 85.9%–90.3%). Combining the STOP model with RECIST improved specificity and PPV to 100% and predicted NDB on average 11.6 weeks (IQR 1.8–18.0 weeks) prior to developing radiologically defined progression.

Conclusions

After 6 weeks of ICIs, the blood-based STOP model was capable of accurately predicting NDB in metastatic NSCLC patients, earlier than conventional radiological assessment. The combined serological and radiological response assessment creates an early opportunity to safely stop ICI treatment in patients who will not benefit, although the clinical utility of the assay is limited since the high specificity comes at the cost of a lower sensitivity.

Keywords

immune checkpoint inhibitors (ICIs)non-small cell lung cancer (NSCLC)serum tumor markers no durable benefit (NDB)external validation

Highlights

• Only a minority of NSCLC patients treated with immunotherapy achieve a durable clinical benefit.

• Our study successfully validated a tumor marker model to predict which patients will not develop a durable benefit.

• The STOP model (Serum Tumor marker-based Outcome Prediction model) based on CYFRA, CEA, and NSE measurements achieved 96% specificity.

• The combined model predicted no durable benefit on average 11.6 weeks earlier than conventional radiological assessment.

• This model creates an early opportunity to stop immunotherapy in patients who will not benefit.

Introduction

Immune checkpoint inhibitors (ICIs) have provided a significant survival benefit in metastatic non-small cell lung cancer (NSCLC) patients.^1–4 Nonetheless, the 5-year overall survival of 17–24% in patients treated with first-line ICIs shows that only a minority of patients experience a durable clinical benefit.^5–7 Accurately predicting which patients will benefit remains a challenge. For NSCLC patients not harboring an actionable mutation, ICIs with or without concurrent chemotherapy have become the standard first-line therapeutic approach.⁸ In current clinical practice, an oncologist’s decision to continue or terminate ICI therapy is based on tumor response assessment including clinical symptoms and radiographic evaluation using Response Evaluation Criteria in Solid Tumors (RECIST).⁹ However, radiographic assessment may not correspond to clinical benefit since response to ICIs can be delayed and extra thoracic disease can be missed in routine follow-up with thoracic CT scans. Timely identification of no durable benefit (NDB) could enable early treatment discontinuation, preventing elongated exposure to ineffective and potential harmful therapy and providing a window of opportunity for alternative, potentially more beneficial treatment options.

Two biomarker strategies may be used to minimize the amount of ICIs administered to patients who eventually turn out not to benefit from this therapy. First, before start of treatment, biomarkers can select those patients which are most likely to benefit from ICIs. Examples of upfront predictive biomarkers are PD-L1 and tumor mutational burden (TMB). However, several studies have shown that predictions by PD-L1 and TMB are not precise enough to withhold treatment since lung cancers across all PD-L1 expression levels and TMB values may respond to ICIs.^10,11 Other limitations include the temporal and spatial heterogeneity of PD-L1 expression and the lack of consensus on which TMB assay and corresponding cut-off point to use.^12,13 A second strategy is to use biomarkers at an early stage of treatment to identify those patients who will likely not respond to therapy. The advantage of this approach is that actual tumor response can be captured and therefore prediction may be more precise. Since the consequence of incorrectly stopped treatment includes the loss of potential treatment benefit, false positive results (i.e., NDB prediction in patients who do benefit) should be minimized. Therefore, high specificity and positive predictive value (PPV) of NDB predictions are necessary.

Previous studies have shown the potential of serial measurements of carcinoembryonic antigen (CEA), cytokeratin 19 fragment (CYFRA), neuro specific enolase (NSE), and cancer antigen 125 (CA125) in monitoring therapy response and early detection of disease progression in NSCLC patients.^14–18 This study investigates whether these markers are capable of predicting ICI benefit. Van Delft et al. compared multiple methods of model development, including logistic regression and machine learning techniques, to predict non-response using tumor markers in the first 6 weeks of ICI treatment.¹⁹ In a cohort of NSCLC patients treated with ICIs an algorithm including CYFRA, CEA, and NSE provided the most robust performance and identified 61% of patients with a NDB (i.e., progressive disease, death within 6 months of treatment, or disease control of less than 6 months) at a specificity of 95%. Previous studies have also shown prognostic value of RECIST criteria since an increase in tumor size was associated with shorter OS.^20,21

Implementation of biomarkers is often challenging and only few biomarkers make it to actual use in clinical practice, mainly due to a lack of external validation in the target population.^22,23 Therefore, we aimed to (1) externally validate a previously developed tumor marker algorithm to predict NDB based on measurements obtained during the first 6 weeks of ICIs in a real-world cohort and (2) assess the added value of combining this serum tumor marker algorithm with radiological response evaluation using RECIST.

Methods

Study population

Consecutive patients were retrospectively selected from two hospitals in the Netherlands: the Netherlands Cancer Institute (NKI) and the Radboud university medical center. Inclusion criteria were stage IV NSCLC and treatment initiation with PD-(L)1 inhibitors with or without chemotherapy between January 2018 and January 2022. Based on the availability of serum tumor marker results, patients were included in the analyses. Clinical characteristics including age, sex, smoking status, histology of the lung tumor, treatment, and survival were collected from electronic patient files (Table 1). To evaluate tumor response, CT scans of the chest and, if indicated, of the upper abdomen and cerebrum were assessed using RECIST version 1.1 by two independent observers who were blinded for the model predictions.⁹ Radiological response was assessed at 6–8 weeks after start of treatment, followed by 3 monthly follow-up. Patients were followed from ICI therapy initiation to death or date of censoring, at 1st of March 2023. The clinical endpoint NDB was defined as progressive disease, death within 6 months of ICI treatment or disease control of less than 6 months (i.e., complete, partial, or stable disease of less than 6 months). Conversely, patients with a partial response or stable disease by RECIST of more than 6 months were categorized as durable clinical benefit (DCB).²⁴ The outcome metric NDB versus DCB was chosen because the incorporated 6 months’ time window enables to differentiate patients based on the effect of ICIs rather than the chemotherapy effect in the case of combined chemo-immunotherapy. Progression-free survival (PFS) was calculated from the date of the first ICI infusion to the time of death or first radiological progression by RECIST, whichever came first, or censored at most recent follow-up. Overall survival (OS) was calculated from the date of first ICI infusion to time of death or censored at most recent follow-up.

Table 1.

Clinical characteristics of the validation cohort (n = 242).

	Validation cohort (N = 242)
Mean age in years (SD)	64.3 (9.5)
Females (%)	122 (50.4)
Stage of disease (%)
IIIB-C	15 (6.2)
IVA	97 (40.1)
IVB	130 (53.7)
ECOG performance status at therapy start (%)
0	65 (26.9)
1	156 (64.5)
2	19 (7.9)
3	2 (0.8)
Smoking status (%)
Smoker	57 (23.6)
Former smoker	157 (64.9)
Never smoker	28 (11.6)
Mean pack years (SD)	35.4 (23.5)
Brain metastasis (%)	43 (17.8)
Histology lung tumor (%)
Adenocarcinoma	180 (74.4)
Squamous cell carcinoma	36 (14.9)
Other^b	26 (10.7)
PD-L1 status (%)
<1%	92 (38.0)
1–50%	61 (25.2)
>50	84 (34.7)
Not tested	5 (2.1)
Treatment (%)
Immunotherapy	113 (46.7)
Immuno-chemotherapy	129 (53.3)
Lines of systemic therapy prior to immunotherapy (%)
0	113 (46.7)
1	77 (31.8)
≥2	52 (21.5)
Reason for treatment discontinuation (%)
Progressive disease	136 (56.2)
Immune-related adverse events	46 (19.0)
Treatment completed	24 (9.9)
Lost to follow-up	7 (2.9)
Treatment ongoing at moment of database lock	19 (7.9)
Other^c	10 (4.1)
RECIST at 6 months after treatment initiation (%)
Complete response	3 (1.2)
Partial response	59 (24.4)
Stable disease	68 (28.1)
Progressive disease	52 (21.5)
Other^d	60 (24.8)
NDB (%)	111 (45.9)
Mean duration of immunotherapy (SD)	8.6 months (8.1)
Mean duration of follow-up (SD)	19.6 months (14.3)

SD: standard deviation, ECOG: Eastern Cooperative Oncology Group, PD-L1: programmed death-ligand 1.

^aMono immunotherapy consisted of pembrolizumab (68 patients), nivolumab (24 patients), atezolizumab (8 patients), avelumab (2 patients), or durvalumab (1 patient). Chemo-immunotherapy regimes consisted of carbo- or cisplatin, paclitaxel or pemetrexed with pembrolizumab (82 patients) or carboplatin, paclitaxel, bevacizumab with atezolizumab (34 patients).

^bThe category “other” histology subtype includes large cell neuroendocrine carcinoma (10 patients), NSCLC-NOS (not otherwise specified; 10 patients), adenosquamous carcinoma (5 patients), and simultaneous adenocarcinoma and squamous cell carcinoma (1 patient).

^cThe category “other” reasons for treatment discontinuation includes patient’s preference, patient’s condition, or complications.

^dThe category “other” RECIST includes patients where no CT scan was performed after 6 months because of poor condition, patients who were already deceased or CT scan evaluation after 6 months was not applicable because patients developed progressive disease earlier and started a new treatment.

This study was conducted according to the guidelines of the Declaration of Helsinki. This study was approved by the local institutional review board of the Netherlands Cancer Institute (IRBd21-149) and the medical research ethics committee of the Radboud university medical center (2021–13207). This validation has been performed according to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) statement.²⁵

Tumor marker measurements

Blood samples for tumor marker measurements were collected prior to treatment initiation and every 3 weeks thereafter as part of regular care. Directly after blood collection, the concentration of CYFRA, CEA, CA-125, and NSE were assessed using a Roche Cobas 6000, 8000 or a Roche Cobas Pro analyzer system (Roche Diagnostics, Germany). Tumor marker results at two time points, baseline and 6 weeks after ICI initiation, were required. The baseline measurement was defined as the measurement between 31 days before and 7 days after treatment initiation. The 6 weeks timepoint aligned with the first CT scan to evaluate tumor response during treatment and was set between 35 and 49 days after start of treatment. If multiple measurements within the predefined time range were available, the tumor marker measurement closest to day 0 (i.e., start of treatment) or day 42 (i.e., week 6) was taken.

Tumor marker models for validation

In a previous study, tumor marker models based on combinations of CYFRA, CEA, CA-125, and NSE were developed.^19,26 Model development aimed to predict non-response with high specificity in order to avoid false positive results, which could lead to undertreatment when ICI therapy is withdrawn. The five models which achieved high specificity (i.e., >90%) and reasonable sensitivity (i.e., >55%) in the validation set of this study were selected for external validation: (1) boosting with CYFRA and CEA; (2) boosting with CYFRA, CEA, and NSE; (3) boosting with CYFRA, CEA, NSE, and CA-125; (4) random forest with CYFRA, CEA, and NSE; and (5) recurrent neural network with CYFRA, CEA, and NSE.

To assess the predictive value of radiographic evaluation by RECIST, we calculated the concordance between progressive disease according to RECIST at 6–8 weeks after treatment initiation and NDB. Next, we investigated whether combining early RECIST evaluation with a serum tumor marker model would increase accuracy in predicting NDB compared to using RECIST or the serum tumor marker model alone. For this purpose, we calculated the specificity, sensitivity, PPV, and NPV of (1) the serum tumor marker model alone; (2) RECIST alone and; and (3) the serum tumor marker model combined with RECIST (Figure 1(a)).

Figure 1.

Study design. (a) A previously developed tumor marker model based on CYFRA, CEA, and NSE measurements at baseline (day 0) and week 6 (i.e., STOP model) was combined with radiographic evaluation by RECIST after 6 weeks to predict NDB after 24 weeks of ICI treatment. (b) Flowchart of inclusion of patients in this validation cohort.

Statistical analyses

Sample size calculations were performed to assess how many patients in the external validation cohort would be needed to obtain results with a predefined confidence interval.²⁷ Based on an observed/expected ratio, C-statistic and net benefit calculation which target a 95% confidence width of 0.3, 0.15 and 0.3, respectively, the largest sample size needed was 206 patients with a minimum of 93 events (i.e., NDB cases).

For each tumor marker model, only complete cases with measurement results of the included tumor markers were analyzed. For sensitivity, specificity, PPV, and NPV, 95% confidence intervals for proportions were calculated. Potential confounding by type of treatment, treatment line, histology subtype and smoking status was assessed by calculating the performance of the model in different subgroups. Kaplan–Meier analyses were conducted for PFS and OS and the logrank test was used for statistical comparison between groups. Hazard ratios were calculated using Cox proportional hazards regression analyses. A calibration plot was designed to test the agreement between the observed and predicted outcomes and the calibration slope was calculated. All analyses were executed in R version 4.0.4 using “ggplot,” “survival,” and “survminer” packages.^28–30

Results

Clinical characteristics of the validation cohort

In total, 242 patients were included (Figure 1(b), Supplementary Figure 1). Patients had a mean age of 64 years, were mainly current or former smokers (88.4%) and mostly diagnosed with an adenocarcinoma (74.4%) (Table 1). Approximately half of the cohort was treated with ICIs only (46.7%) and the other half was treated with ICIs in combination with chemotherapy (53.3%). Most patients received ICIs as a first-line treatment (46.7%). In total, 111 out of 242 patients developed a NDB (45.9%). The median PFS and OS of the whole cohort were 7.3 months (95% CI 6.4–8.9 months) and 19.0 months (95% CI 15.7–24.4 months), respectively (Supplementary Figure 2A-B). The median follow-up of our cohort is 16.0 months (range 1.4–90.6 months).

Performance of the tumor marker model in the external validation cohort

Performance of the models was assessed with a primary focus on specificity to avoid withdrawing potentially beneficial treatment. In our cohort, the previously developed random forest model based on a combination of CYFRA, CEA, and NSE achieved the highest combination of specificity and sensitivity of 95.9% (95% CI 94.6%–97.2%) and 38.1% (95% CI 34.9%–41.4%), respectively (Table 2; Supplementary Tables 1A-B and 2). Therefore, this model was selected for further evaluation and named the STOP (Serum Tumor marker-based Outcome Prediction) model. When comparing the performance of the STOP model between patients with immunotherapy only and chemo-immunotherapy, the two study sites of enrollment, histology subtypes or first line versus later line treatment, no substantial differences between sensitivity and specificity were observed (Supplementary Table 3; Supplementary Figure 3). In never smokers, the model showed a lower specificity of 90.9% (95% CI 85.4%–96.4%) compared to 96.4% (95% CI 95.1%–97.7%) for active or former smokers.

Table 2.

Predictive accuracy of STOP, RECIST, and the combination of these models.

	Prediction of no durable benefit (NDB)
	Specificity (95% CI)	PPV (95% CI)	Sensitivity (95% CI)	NPV (95% CI)
NDB by STOP model	95.9% (94.6–97.2)	88.1% (85.9 90.3)	38.1% (34.9–41.4)	66.1% (62.9–69.3)
Early PD^a by RECIST	95.9% (94.6–97.2)	87.2% (84.9–89.4)	35.1% (31.8–38.3)	65.0% (61.8–68.2)
Combined model ^b	100% (100–100)	100% (100–100)	20.6% (17.9–23.4)	61.3% (58.0–64.6)

^aEarly PD; progressive disease at the first RECIST evaluation at 6–8 weeks after start of treatment, sensitivity; proportion of non-responders which are correctly predicted by the test, specificity; proportion of responders which are correctly predicted by the test, PPV, positive predictive value; proportion of predicted non-responders which are actually non-responders, NPV, negative predictive value; proportion of predicted responders which are actually responders.

^bCombined model; NDB prediction by the STOP model in combination with early PD at CT scan evaluation by RECIST after 6–8 weeks of ICI treatment.

The detailed patient counts in 2 × 2 matrices are represented in Supplementary Table 5.

In five patients the STOP model incorrectly predicted a NDB outcome, resulting in the specificity of 96% (Supplementary Table 2; Supplementary Table 4A-B; Figure 2(a)). One of these patients was diagnosed with a membranous nephropathy secondary to lung cancer, resulting in elevated CEA throughout ICI treatment (range 296–1300 ng/mL) (Supplementary Figure 4). In two patients, an elevation in NSE was observed which likely resulted in the false positive prediction of the STOP model. In the other two patients, a small absolute elevation in tumor marker concentrations was present but because the baseline concentrations were low this resulted in a relatively large increase in terms of percentage.

Figure 2.

Swimmer plots including STOP model and combined model predictions. (a) The predictions by the STOP model are categorized as DCB prediction (green circle) and NDB prediction (red circle). Swimmer plot in which patients are categorized based on whether they developed DCB (left window, n = 131) or NDB (right window, n = 111). All individual patients are represented as bars on the y-axis and time in months is plotted on the x-axis. The duration of ICI treatment is shown in light blue and the duration of follow-up in light gray. If a patient deceased, this is indicated with a gray square at the end of the bar. The occurrence of progressive disease is indicated with the orange triangle. (b) The predictions by the combined model (imaging and STOP model) are categorized as DCB prediction (green circle) and NDB prediction (red circle).

The STOP model, based on tumor marker measurements before start and after the first 6 weeks of treatment, predicted NDB with a median of 11.7 weeks (IQR 2.1–18.1 weeks) prior to radiologically defined progression, indicating a clinically significant gain in time by the STOP model compared to regular radiological assessment (Figure 2).

NDB prediction by the STOP model was associated with a median PFS of 1.9 months (95% CI 1.5–3.4 months) compared to 9.3 months (95% CI 8.1–14.5 months, p < 0.0001) for DCB prediction. The hazard ratio (HR) to differentiate between patients with and without response to ICIs was 4.2 (95% CI 2.9–6.1) (Figure 3). The calibration plot of the STOP model showed the model is performing well in the validation setting and that the event rate of NDB approaches the predicted risk of NDB by the model (Supplementary Figure 5). This data shows that after 6 weeks of ICI treatment, the blood-based STOP model was capable of accurately predicting non-response in metastatic NSCLC patients.

Figure 3.

Progression-free survival analysis of the STOP model. Kaplan–Meier curve of progression-free survival categorized by the STOP model prediction at 6–8 weeks after ICI initiation.

Additional value of the STOP model next to RECIST evaluation

Progressive disease (PD) according to the first RECIST evaluation at 6–8 weeks after start of treatment corresponded with NDB after 6 months in 87% of patients (Table 2). Moreover, the proportion of patients with a DCB which were correctly predicted as non-PD by RECIST was 96%, which is comparable to the specificity of 96% of the STOP model. RECIST might be limited in its specificity and PPV of predicting DCB due to pseudoprogression. In our cohort, five patients in total (2.3%) were misclassified as having progressive disease at the first RECIST evaluation while they developed a durable benefit. For all of these five patients, the STOP model predicted a DCB, underscoring the added value of this model (Figure 4; Supplementary Figure 6; Supplementary Table 4A-B).

Figure 4.

Predictions by the STOP model in relation to RECIST evaluations. Scatter plot with class probabilities of the STOP model predictions on the y-axis and RECIST evaluation 6–8 weeks after ICI initiation on the x-axis. Patients were stratified by outcome of NDB (green circle) versus DCB (red triangle). The horizontal line at Y = 0.75 indicates the threshold identified in the previously published cohort by van Delft et al. to discriminate between NDB and DCB predictions of the STOP model.

Combining RECIST with the STOP model resulted in a specificity of 100% and PPV of predicting NDB of 100%, compared to 95.9% and 87.2% for RECIST and 95.9% and 88.1% for the STOP model alone, highlighting the complementarity of these approaches. The gain in specificity and PPV of the combined model comes at the cost of a decrease in sensitivity to 21%. The combined model predicted NDB with a median of 11.6 weeks (IQR 1.8–18.0 weeks) prior to radiologically defined progression, indicating a similar gain in time by the combined model compared to the STOP model alone.

When using the combined model, the median PFS of patients with a NDB prediction was 1.3 months (95% CI 1.2–1.5 months) compared to 8.7 months (95% CI 7.4–10.9 months) in patients with a DCB prediction (p < 0.0001; Figure 5(a); Supplementary Figure 7A). Evaluation of PFS depending on the outcome of the combined model showed a HR of 23.7 (95% CI 12.6–44.6), confirming the ability of the model to stratify patients based on their response to ICI treatment. The OS analysis showed a significant difference as well, with 4.9 months (95% CI 2.9–12.4 months) versus 24.2 months (95% CI 18.8–32.5 months) for NDB prediction versus DCB prediction, respectively, and a HR of 4.3 (95% CI 2.6–7.1) (p < 0.0001; Figure 5(b), Supplementary Figure 7B).

Figure 5.

Survival analyses of the STOP model in combination with RECIST. (a) Kaplan–Meier curve of progression-free survival analysis categorized by prediction of the STOP model combined with RECIST evaluation at 6–8 weeks after ICI initiation. * PR/SD; partial response or stable disease, PD; progressive disease. (b) Kaplan–Meier curve of overall survival analysis categorized by outcome by prediction of the STOP model combined with RECIST evaluation at 6–8 weeks after ICI initiation. * PR/SD; partial response or stable disease, PD; progressive disease.

How to use the combined model in clinical practice

In our cohort, 20 patients (9%) were categorized as non-responders by the combined model and indeed no patient achieved durable benefit (Figure 6). In these patients, ICIs could have been safely discontinued after a median of 6.1 weeks after the start of treatment. In 41 patients (19%), the predictions are contradictory (i.e., PD combined with negative STOP prediction or CR/PR/SD with a positive STOP prediction) and out of these 41, 31 patients (76%) developed NDB. Since the chance of NDB is relatively high, we would advise to closely monitor these patients and await the next radiological evaluation at 3 months after start of treatment. Lastly, 158 patients (72%) were categorized as responders by the combined model. In these patients we would continue ICIs after the first 6 weeks of treatment.

Figure 6.

Potential application of the combined model to personalize treatment. Patients start with (chemo)-immunotherapy according to the national guidelines and after 6–8 weeks, radiological evaluation by RECIST and blood-based evaluation of serum tumor markers by the STOP model will be performed. When the CT scan indicates PD in combination with a positive prediction by the STOP model, our advice would be to stop ICIs and switch to another treatment. If the results are contradictory (i.e., PD combined with negative STOP prediction or PR/SD with positive STOP prediction), we would advise to await the next radiological evaluation. In case PR or SD is seen and the STOP model is negative, we would continue ICIs.

Discussion

Here we present the external validation of the previously published STOP model, which includes CYFRA, CEA and NSE measurements in the first 6 weeks of ICI treatment. Our results show that the STOP model was able to achieve 96% specificity in an external cohort. Furthermore, no confounding by type of therapy, line of treatment and histology subtype was observed. By combining the STOP model with readily available RECIST evaluations at 6–8 weeks after initiation of therapy, a specificity and positive predictive value of 100% were reached. Furthermore, in misclassified PD cases by RECIST (i.e., patients with pseudoprogression), the STOP model predicted NDB which highlights the added value of this model. To the best of our knowledge, this is the first study that successfully validated a serum tumor marker model in order to discontinue ICI treatment in NSCLC patients without clinical benefit at an early stage.

Nowadays, the majority of metastatic NSCLC patients start with a treatment regimen which includes ICIs. Since the effect of ICI treatment may not be directly noticeable and progressive disease on a CT scan may also be the result of pseudoprogression, the decision to stop ICIs if often postponed.³¹ As a result, patients remain on ICIs for prolonged periods of time, which can have several negative effects. A higher number of pembrolizumab cycles have been associated with an increased risk of immune-related adverse events, implying that early discontinuation could reduce the risk of adverse events.³² Moreover, grade 3 or 4 immune-related adverse events have been reported in up to 20% of patients with single-agent ICI treatment, indicating the potential reduction in burden of immune-related adverse events.³³ Furthermore, the prolonged treatment in these patients also has a significant financial impact on the healthcare system since ICI treatment costs up to 132.535 dollar per patient per year.³⁴

In our external validation cohort, the majority of patients were treated with ICI as first-line treatment (46.7%), whereas in the cohort of van Delft et al., in which the STOP model was developed, almost all patients received ICI as part of a second or later line treatment (98.0%, Supplementary Table 6).²⁶ In our cohort, the combination of ICIs with chemotherapy was more prominent (53.3% vs 0%) and a lower percentage of patients with NDB (45.9%) was observed compared to the cohort of van Delft et al. (68.4%). Despite these differences between the cohorts, the STOP model achieved 96% specificity when predicting DCB, which is similar to the 93% specificity in the cohort in which the model was developed. The sensitivity, however, was significantly lower compared to the van Delft et al. cohort (38% vs 61%, respectively). Although this reduces the number of patients eligible to stop treatment early, it does not impact the validity of these decisions since the specificity remained high.

To implement a predictive biomarker model in clinical care, certain requirements should be considered, including the diagnostic performance, external validation and practical feasibility. The diagnostic performance of a test can be expressed in terms of specificity, sensitivity, positive and negative predictive value. These metrics show the actual effect of a predictive biomarker in a certain patient population. Nonetheless, previous studies investigating the predictive performance of tumor markers in the context of ICIs mainly focused on other outcome parameters such as a stratification in PFS or area under the curve (AUC).^16,17,35,36 Although these studies do show a clear difference in high and low risk groups, clinical utility of the biomarker cannot be judged based on these results. In our study, we investigated a tumor marker model specifically developed to withhold non-beneficial treatment. We focused on a high specificity and PPV for the prediction of DCB, since this is needed to minimize false positive predictions of NDB and thereby avoid treatment discontinuation in patients who are expected to benefit.

Assessing the performance of a predictive model in a different cohort than the training cohort, and which is similar to the population of intended usage, is deemed necessary to assess the generalizability.²³ When adding more biomarkers in a single prediction model, biomarker models will become more prone to overfitting and to avoid this pitfall validation is warranted.³⁷ Nonetheless, only a minority of the published studies about biomarker models validate their results in a separate validation cohort. Positive examples include Donker et al., who presented a naïve Bayes classifier based on smoking status, histology, therapy line and 7 genes tracked in ctDNA that could predict durable benefit with 84% sensitivity and 55% specificity in a separate test cohort.³⁸ Furthermore, Nabet et al. developed a Bayesian model based on TMB, CD8 fractions, and ctDNA fold change within the first 4 weeks of treatment which achieved a specificity of 94% in their validation cohort.³⁹ Although the performance of the biomarker model of Nabet et al. is in line with our tumor marker model, their validation cohort consisted of only 38 patients and therefore warrants replication in a larger cohort. In contrast, we successfully included the number of patients required for sufficient power, based on sample size calculations performed before the start of study.

Lastly, a biomarker model should be easy to implement, for which serum tumor markers have significant advantages over other biomarkers. Serum tumor markers have been (pre-) analytically well characterized and are generally available.⁴⁰ Other advantages are their quick turn-around-time, the robust and quality controlled automated instruments, and the cost-efficient measurements since they are relatively cheap, allowing testing in serial manner. Upfront biomarkers which are only measured before treatment, such as PD-L1, cannot predict NDB precise enough to withhold treatment (Supplementary Table 7). Circulating tumor DNA (ctDNA) is another biomarker capable of discriminating between ICI responders and non-responders.^41–43 However, a major disadvantage of ctDNA is that it cannot be detected in 7–27% of metastatic NSCLC patients at baseline when using panel sequencing.^41–43 Larger gene panels and increased depth of coverage are needed to increase utility, but will also significantly increase costs, hampering implementation of ctDNA as a monitoring tool in clinical practice.

When we started with the inclusion of patients, tumor marker measurements were not part of regular clinical practice yet. During our inclusion period, tumor marker measurements became common practice in our hospitals. However, as a result of this, tumor marker data were missing in 70% of patients which were theoretically available based on stage and treatment with ICIs.

Although this may limit the generalizability of our results, the main advantage of our study is that it comprises a real-world cohort which was treated with conventional treatment following the clinical guidelines. Therefore, the results of our cohort may still be better generalizable compared to a prospective study, for which often strict inclusion criteria are used. An intrinsic limitation of tumor markers is the fact that these markers are not specific for a certain cancer entity and various (patho)physiological conditions may influence tumor marker concentrations.^44,45 In the validation cohort, only one patient was incorrectly classified as having a NDB because of elevated tumor marker concentrations due to comorbidity. This implies the effect of this limitation seems to be minimal when using the STOP model.

Both clinicians and patients may be reluctant to stop ICI therapy, even if only a small chance of DCB is present. Therefore, the specificity and PPV of our STOP model by itself (96% and 88%, respectively) may not be high enough for treatment discontinuation. When our model is used in combination with RECIST, however, a 100% specificity and PPV was obtained. This paves the way for a prospective clinical trial using the combination of the STOP model with RECIST evaluation. In patients with NDB prediction by STOP and early PD and, our recommendation would be to stop ICIs. If results are contradictory (i.e., PD with DCB prediction by STOP or PR/SD with NDB prediction by STOP), we would advise close monitoring of these patients and await the next radiological evaluation. In patients with a DCB prediction and CR/PR/SD, ICIs can be continued. A prospective clinical trial should ultimately proof clinical utility, by confirming the early window of opportunity to switch to another treatment, the reduction of immune-related adverse events, and cost-effectiveness of this approach. Future treatment options may include antibody-drug conjugates (ADCs) or combinations of targeted therapies with immunotherapy. ADCs represent a promising therapeutic strategy which combine the specificity of monoclonal antibodies with the cytotoxic potential of chemotherapeutic agents, targeting specific antigens expressed on cancer cells. ADCs have showed significant efficacy in various solid tumors, including NSCLC.^46,47 Furthermore, the integration of targeted therapies with immune checkpoint inhibitors is being explored to potentially overcome resistance mechanisms and enhance anti-tumor responses.⁴⁸

Conclusion

In conclusion, the STOP model based on CYFRA, CEA, and NSE is capable of predicting which metastatic NSCLC patients treated with ICIs will experience no durable benefit in a real-world cohort in line with the population of intended usage. When increasing specificity and PPV by combining the STOP model with RECIST evaluations, 1 in 5 non-responders could be identified with a specificity and PPV of 100%. A clinician will be reluctant to stop ICI treatment when there is a small chance of response to ICI treatment (i.e., incorrectly discontinuing ICI treatment). Therefore, we believe that although the sensitivity is relatively low, this model is of added value because in 1 out of 5 patients treatment can be 100% safely discontinued. Furthermore, our model is of added value for those patients suspected of pseudoprogression. Although this is a relatively rare phenomenon, in many patients with early progression, ICI treatment is continued because of the small chance of pseudoprogression. Since our model is capable of predicting no durable benefit on the long term, our model can guide clinicians to stop treatment early and confirm the absence of pseudoprogression.

Our study successfully validated a model that repurposed well-known serum tumor markers into a decision support tool using state-of-the-art machine learning techniques, which can guide decisions of patients and clinicians as early as 6 weeks after treatment initiation. Our findings provide a ready to use and easy to implement decision tool based on low-priced, robust and generally available serum tumor markers.

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material for External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma by Milou M. F. Schuurbiers, Freek A. van Delft, Hendrik Koffijberg, Maarten J. IJzerman, Kim Monkhorst, Marjolijn J. L. Ligtenberg, Daan van den Broek, Huub H. van Rossum, and Michel M. van den Heuvel in Tumor Biology.

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Supplemental Material

Supplemental Material - External validation of a serum tumor marker algorithm for early prediction of no durable benefit to immunotherapy in metastastic non-small cell lung carcinoma

Statements and declarations

Footnotes

Acknowledgements

We thank the Department of Laboratory Medicine of the Radboud university medical center, and in particular Teun van Herwaarden for his help in retrieving the serum tumor marker results of the RUMC patients.

Authors’ contributions

MMFS: conceptualization, methodology, software, data curation, formal analysis, and writing—original draft. FAD: conceptualization, methodology, software, and writing—review and editing. HK: writing—review and editing. KM: writing—review and editing. MJLL: writing—review and editing. DB: writing—review and editing. HHR: conceptualization, supervision, and writing—review and editing. MMH: conceptualization, supervision, and writing—review and editing.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MJLL received consulting fees from AstraZeneca, GlaxoSmithKline, Illumina, Janssen Pharmaceuticals, Lilly, and Merck Sharp & Dohme. All these relations were not related to this study and were paid to the institution. KM received a research grant from Astra Zeneca, non-financial support from Roche, Takeda, Pfizer, PGDx, and Delfi; speakers fees from MSD, Roche, Astra Zeneca, and Benecke; and consultant fees from Pfizer, BMS, Roche, MSD, Abbvie, AstraZeneca, Diaceutics, Lilly, Bayer, Boehringer Ingelheim, Merck, and Amgen. DB received research funding from Delfi. HvR is owner and director of Huvaros B.V. HvR has stocks in SelfSafeSure Blood Collections B.V. MMH received an unrestricted grant from Roche Diagnostics. All remaining authors have declared no conflicts of interest.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical considerations

This study was approved by the local institutional review board of the Netherlands Cancer Institute (IRBd21-149) and the medical research ethics committee of the Radboud university medical center (2021–13207).

Consent for publication

All authors approved the final version of this manuscript for publication.

ORCID iD

Milou M. F. Schuurbiers

Supplemental Material

Supplemental material for this article is available online.

Appendix

References

Topalian

Hodi

Brahmer

, et al. Safety, activity, and immune correlates of anti-pd-1 antibody in cancer. N Engl J Med 2012; 366: 2443–2454.

Brahmer

Tykodi

Chow

, et al. Safety and activity of anti-pd-l1 antibody in patients with advanced cancer. N Engl J Med 2012; 366: 2455–2465.

Uram

Wang

, et al. Pd-1 blockade in tumors with mismatch-repair deficiency. N Engl J Med 2015; 372: 2509–2520.

Howlader

Forjaz

Mooradian

, et al. The effect of advances in lung-cancer treatment on population mortality. N Engl J Med 2020; 383: 640–649.

Kudaba

Y-L

, et al. Five-year outcomes with pembrolizumab versus chemotherapy as first-line therapy in patients with non–small-cell lung cancer and programmed death ligand-1 tumor proportion score ≥1% in the keynote-042 study. J Clin Oncol 2022; 41: 1986–1991.

Brahmer

Lee

J-S

Ciuleanu

T-E

, et al. Five-year survival outcomes with nivolumab plus ipilimumab versus chemotherapy as first-line treatment for metastatic non–small-cell lung cancer in checkmate 227. J Clin Oncol 2023; 41: 1200.

Camidge

Doebele

Kerr

. Comparing and contrasting predictive biomarkers for immunotherapy and targeted therapy of nsclc. Nat Rev Clin Oncol 2019; 16: 341–355.

Planchard

Popat

Kerr

, et al. Metastatic non-small cell lung cancer: esmo clinical practice guidelines for diagnosis, treatment and follow-up. Ann Oncol 2018; 29: iv192–iv237.

Eisenhauer

Therasse

Bogaerts

, et al. New response evaluation criteria in solid tumours: revised recist guideline (version 1.1). Eur J Cancer 2009; 45: 228–247.

10.

Ricciuti

Wang

Alessi

, et al. Association of high tumor mutation burden in non–small cell lung cancers with increased immune infiltration and improved clinical outcomes of pd-l1 blockade across pd-l1 expression levels. JAMA Oncol 2022; 8: 1160–1168.

11.

Tan

Aguiar

Haaland

, et al. Comparative effectiveness of immune-checkpoint inhibitors for previously treated advanced non-small cell lung cancer – a systematic review and network meta-analysis of 3024 participants. Lung Cancer 2018; 115: 84–88.

12.

Ramos-Paradas

Hernández-Prieto

Lora

, et al. Tumor mutational burden assessment in non-small-cell lung cancer samples: results from the tmb(2) harmonization project comparing three ngs panels. J Immunother Cancer 2021; 9: e001904.

13.

Haragan

Field

Davies

MPA

, et al. Heterogeneity of pd-l1 expression in non-small cell lung cancer: implications for specimen sampling in predicting treatment response. Lung Cancer 2019; 134: 79–84.

14.

de Kock

Borne

BVD

Soud

, et al. Circulating biomarkers for monitoring therapy response and detection of disease progression in lung cancer patients. Cancer Treat Res Commun 2021; 28: 100410.

15.

Arrieta

Varela-Santoyo

Cardona

, et al. Association of carcinoembryonic antigen reduction with progression-free and overall survival improvement in advanced non-small-cell lung cancer. Clin Lung Cancer 2021; 22: 510–522.

16.

Dal

BMG

Filiberti

Alama

, et al. The role of cea, cyfra21-1 and nse in monitoring tumor response to nivolumab in advanced non-small cell lung cancer (nsclc) patients. J Transl Med 2019; 17: 74.

17.

Lang

Horner

Brehm

, et al. Early serum tumor marker dynamics predict progression-free and overall survival in single pd-1/pd-l1 inhibitor treated advanced nsclc-a retrospective cohort study. Lung Cancer 2019; 134: 59–65.

18.

van den Heuvel

Holdenrieder

Schuurbiers

MMF

, et al. Serum tumor markers for response prediction and monitoring of advanced lung cancer: a review focusing on immunotherapy and targeted therapies. Tumor Biol 2023, In press.

19.

van Delft

Schuurbiers

MMF

Muller

, et al. Comparing modeling strategies combining changes in multiple serum tumor biomarkers for early prediction of immunotherapy non-response in non-small cell lung cancer. Tumor Biol 2023, In press.

20.

Litière

Isaac

Vries

EGED

, et al. Recist 1.1 for response evaluation apply not only to chemotherapy-treated patients but also to targeted cancer agents: a pooled database analysis. J Clin Oncol 2019; 37: 1102–1110.

21.

M-W

Dong

Meyers

, et al. Committee oBotRECiSTS: evaluating continuous tumor measurement-based metrics as phase ii endpoints for predicting overall survival. JNCI: J Natl Cancer Inst 2015: 107.

22.

Selleck

Senthil

Wall

. Making meaningful clinical use of biomarkers. Biomark Insights 2017; 12: 1177271917715236.

23.

Dobbin

Cesano

Alvarez

, et al. Validation of biomarkers to predict response to immunotherapy in cancer: volume ii — clinical validation and regulatory considerations. J Immunother Cancer 2016; 4: 77.

24.

Rizvi

Hellmann

Snyder

, et al. Cancer immunology. Mutational landscape determines sensitivity to pd-1 blockade in non-small cell lung cancer. Science 2015; 348: 124–128.

25.

Moons

Altman

Reitsma

, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73.

26.

van Delft

Schuurbiers

Muller

, et al. Modeling strategies to analyse longitudinal biomarker data: an illustration on predicting immunotherapy non-response in non-small cell lung cancer. Heliyon 2022; 8: e10932.

27.

Riley

Debray

TPA

Collins

, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med 2021; 40: 4230–4251.

28.

Wickham

. Ggplot2: elegant graphics for data analysis. New York: Springer-Verlag, 2016.

29.

Biecek AKaMKaP : Survminer: drawing survival curves using ‘ggplot2’. 2021.

30.

Grambsch

TMTPM

. Modeling survival data: extending the {c}ox model. New York: Springer, 2000.

31.

Marron

Ryan

Reddy

, et al. Considerations for treatment duration in responders to immune checkpoint inhibitors. J Immunother Cancer 2021; 9: e001901.

32.

Eun

Kim

Sun

, et al. Risk factors for immune-related adverse events associated with anti-pd-1 pembrolizumab. Sci Rep 2019; 9: 14039.

33.

Puzanov

Diab

Abdallah

, et al. Managing toxicities associated with immune checkpoint inhibitors: consensus recommendations from the society for immunotherapy of cancer (sitc) toxicity management working group. J Immunother Cancer 2017; 5: 95.

34.

Kessler

Park

Grizzle

, et al. Cost of illness of stage iv non-small cell lung cancer (nsclc) positive for programmed cell death ligand 1 (pd-l1) in the us. Expert Rev Pharmacoecon Outcomes Res 2023; 23: 55–61.

35.

Zhang

Yuan

Chen

, et al. Dynamics of serum tumor markers can serve as a prognostic biomarker for Chinese advanced non-small cell lung cancer patients treated with immune checkpoint inhibitors. Front Immunol 2020; 11: 1173.

36.

Chen

Wen

Xia

, et al. Association of dynamic changes in peripheral blood indexes with response to pd-1 inhibitor-based combination therapy and survival among patients with advanced non-small cell lung cancer. Front Immunol 2021; 12.

37.

Collins

de Groot

Dutton

, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol 2014; 14: 40.

38.

Donker

Schuuring

Heitzer

, et al. Decoding circulating tumor DNA to identify durable benefit from immunotherapy in lung cancer. Lung Cancer 2022; 170: 52–57.

39.

Nabet

Esfahani

Moding

, et al. Noninvasive early identification of therapeutic benefit from immune checkpoint inhibition. Cell 2020; 183: 363–376.e313.

40.

Molina

Holdenrieder

Auge

, et al. Diagnostic relevance of circulating biomarkers in patients with lung cancer. Cancer Biomarkers 2010; 6: 163–178.

41.

Ricciuti

Jones

Severgnini

, et al. Early plasma circulating tumor DNA (ctdna) changes predict response to first-line pembrolizumab-based therapy in non-small cell lung cancer (nsclc). J Immunother Cancer 2021; 9: e001504.

42.

Thompson

Carpenter

Silva

, et al. Serial monitoring of circulating tumor DNA by next-generation gene sequencing as a biomarker of response and survival in patients with advanced nsclc receiving pembrolizumab-based therapy. JCO Precision Oncology 2021; 5: 510–524.

43.

Guibert

Jones

Beeler

, et al. Targeted sequencing of plasma cell-free DNA to predict response to pd1 inhibitors in advanced non-small cell lung cancer. Lung Cancer 2019; 137: 1–6.

44.

Chen

Tao

Zhang

, et al. Elevated squamous cell carcinoma antigen, cytokeratin 19 fragment, and carcinoembryonic antigen levels in diabetic nephropathy. Internet J Endocrinol 2017; 2017: 5304391.

45.

Anderson

Reilly

Shashaty

MGS

, et al. Admission plasma levels of the neuronal injury marker neuron-specific enolase are associated with mortality and delirium in sepsis. J Crit Care 2016; 36: 18–23.

46.

Verma

Breadner

Raphael

. Targeting' improved outcomes with antibody-drug conjugates in non-small cell lung cancer-an updated review. Curr Oncol 2023; 30: 4329–4350.

47.

Reyes

Pharaon

Mohanty

, et al.

Arising novel agents in lung cancer: are bispecifics and adcs the new paradigm?

Cancers 2023; 15: 3162.

48.

Saha

Fojtů

Nagar

, et al. Antibody nanoparticle conjugate–based targeted immunotherapy for non–small cell lung cancer. Sci Adv 2024; 10: eadi2046.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.53 MB

0.10 MB

0.65 MB

0.29 MB

0.07 MB

0.45 MB

0.07 MB

1.05 MB