Abstract
Background:
The population impact of colorectal cancer (CRC) screening depends on test performance and uptake, yet the comparative effectiveness of colonoscopy (CS), flexible sigmoidoscopy (FS), fecal immunochemical test (FIT), and guaiac-based fecal occult blood test (gFOBT) remains uncertain.
Objectives:
To compare the 10-year effects of CS, FS, FIT, and gFOBT on CRC incidence, CRC mortality, and all-cause mortality.
Design:
A systematic review and network meta-analysis.
Data sources and methods:
We searched PubMed, EMBASE, and Cochrane CENTRAL from inception to June 13, 2025, for randomized controlled trials enrolling average-risk adults comparing CS, FS, FIT, or gFOBT with no screening (NS) or each other. Risk ratios (RRs) with 95% confidence intervals (CI) were synthesized using a frequentist random-effects model. Intention-to-treat (ITT) analyses estimated invitation-based population effectiveness, with exploratory per-protocol (PP) analyses serving as a complementary assessment for screening completers (high risk of bias and very low certainty). The sparse network lacked closed loops, precluding formal inconsistency assessment.
Results:
In ITT analyses, CS (RR 0.83, 95% CI 0.72–0.96) and FS (0.78, 0.74–0.83), but not FIT (limited evidence) or gFOBT, reduced CRC incidence versus NS. For CRC mortality, FS (0.74, 0.67–0.83) and gFOBT (0.87, 0.79–0.95), but not CS or FIT, reduced risk. No strategy reduced all-cause mortality. In exploratory PP analyses, CS (0.78, 0.62–0.99) and FS (0.72, 0.66–0.79) reduced CRC incidence. For CRC mortality, CS (0.52, 0.31–0.85), FS (0.58, 0.49–0.70), and gFOBT (0.71, 0.61–0.82) reduced risk.
Conclusion:
CS reduced CRC incidence in ITT analyses but showed limited population-level effectiveness when uptake was low. FS reduced CRC incidence and mortality in ITT analyses, and gFOBT reduced CRC mortality. Screening strategies should be selected based on test characteristics, real-world participation, and healthcare capacity. PP findings were exploratory and should not be used in isolation to guide recommendations.
Trial registration:
PROSPERO registration number CRD420251127511.
Plain language summary
Colorectal cancer is common and can be life-threatening, but screening can help prevent it or find it early. Several screening options are used, including colonoscopy, flexible sigmoidoscopy, and stool tests such as the fecal immunochemical test and guaiac-based fecal occult blood test. It is still debated which approach provides the greatest long-term benefit, especially because not everyone who is invited to screening actually participates. In this study, we combined results from nine large clinical trials that followed adults for about 10 years. We compared screening strategies in two ways. First, we looked at what happens when people are invited to screening (a real-world view that includes people who do not participate). Second, we looked at outcomes only among people who completed the test (an exploratory view of how screening may work when it is completed). In the invitation-based analyses, colonoscopy and sigmoidoscopy reduced new colorectal cancer cases, while sigmoidoscopy and guaiac-based fecal occult blood test reduced colorectal cancer deaths. None of the strategies clearly reduced all-cause mortality over this time frame. Among people who completed screening, colonoscopy and sigmoidoscopy showed greater reductions in colorectal cancer and its deaths, but these findings should be interpreted cautiously. Overall, the most appropriate screening strategy may differ depending on the test’s characteristics, how many people are likely to participate, and the setting in which screening is offered.
Introduction
Colorectal cancer (CRC) remains a major global health concern. In 2020, nearly 2 million new cases and 1 million deaths were reported worldwide, making CRC the third most commonly diagnosed cancer and the second leading cause of cancer-related death.1,2 Although incidence and mortality have declined in Western countries, substantial disparities persist across regions, with a recent increase observed in Asia, which now accounts for over half of new cases and CRC-related deaths globally.2,3 These differences likely reflect variations in lifestyle, genetic background, socioeconomic status, access to care, and the organization of screening programs.3 –5
Randomized clinical trials (RCTs) and observational studies have shown that appropriate CRC screening reduces long-term incidence and mortality. Common screening strategies include colonoscopy (CS), flexible sigmoidoscopy (FS), fecal immunochemical test (FIT), and guaiac-based fecal occult blood test (gFOBT), each differing in effectiveness, invasiveness, sensitivity, and acceptability. CS is widely regarded as the gold standard, enabling both visualization and removal of lesions throughout the entire colon, with a recommended 10-year interval after a negative high-quality examination. 6 Although less invasive and more accessible than CS, FS examines only the left colon and has limited preventive reach. 7 As a result, it is gradually being phased out as a primary screening strategy. Annual or biennial FIT is widely used for its simplicity and non-invasive nature, although it has variable sensitivity and a reduced detection rate of proximal lesions. 8 To date, no RCTs have directly evaluated the long-term effectiveness of FIT versus no screening (NS). Most existing RCTs assessing long-term outcomes have focused on biennial gFOBT, which has been the standard stool-based screening strategy in prior RCTs. However, a recent meta-analysis demonstrated that FIT is superior to gFOBT in terms of uptake and adherence, as well as detection of advanced neoplasia and CRC, and FIT is increasingly being adopted in place of gFOBT. 9 Although interest is growing in multitarget stool DNA testing and CT colonography, their global implementation remains limited.10,11
Meanwhile, two large RCTs—NordICC and COLONPREV—have renewed debate about the benefits and limitations of CS.12,13 In NordICC, once-only CS reduced CRC incidence over 10 years in the intention-to-treat (ITT) analysis but did not significantly reduce CRC or all-cause mortality, likely reflecting low uptake. 12 In the per-protocol (PP) analysis, however, CS produced larger reductions in CRC incidence and mortality. In COLONPREV, biennial FIT was noninferior to once-only CS for 10-year CRC incidence and mortality in the ITT analysis; in the PP analysis, CS yielded significantly lower risks. 13
Together, these findings illustrate a persistent gap between invitation-based population effectiveness and completion-based exploratory estimates: ITT analyses reflect population-level effects that incorporate uptake and adherence, whereas PP analyses, though potentially vulnerable to selection bias, may help contextualize outcomes among participants who complete screening. A unified framework that integrates invitation-based population effectiveness with complementary exploratory estimates among screening completers has not been fully established. This network meta-analysis (NMA) addresses this gap by jointly evaluating CS, FS, FIT, and gFOBT in ITT analyses, while using PP analyses as a complementary exploratory assessment, with follow-up harmonized to approximately 10 years to reflect the recommended CS interval. To our knowledge, this is the first NMA to comprehensively compare these strategies based on primary RCT data.
Methods
Search strategy and study selection
This systematic review and NMA was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses-NMA guidelines. We performed a comprehensive literature search using PubMed, EMBASE, and the Cochrane Central Register of Controlled Trials from inception to June 13, 2025. No restrictions were applied for language, year, or publication status. The study protocol was registered with PROSPERO, CRD420251127511. The search strategies are presented in Supplemental Table 1. We included RCTs enrolling average-risk adults aged 45–80 years that evaluated the long-term effectiveness of CRC screening with CS, FS, FIT, or gFOBT. Eligible trials included comparisons of invitation to screening versus NS or usual care (analyzed as a common NS node), and head-to-head comparisons of screening strategies. Studies were included if they reported at least one of the following primary outcomes: CRC incidence, CRC mortality, or all-cause mortality. We excluded studies involving complex interventions not aligned with current guidelines, such as those combining FS with a one-time gFOBT or referring gFOBT-positive individuals to FS, as well as studies incorporating risk assessment scores or questionnaires that could modify the effect of the screening strategy. We also excluded studies in high-risk populations (e.g., inflammatory bowel disease, Lynch syndrome, familial adenomatous polyposis) and trials in which aspirin was the primary intervention.14,15 Because CS is recommended every 10 years, we excluded trials with a mean or median follow-up of less than 10 years. 6
In the included CS trials, lesions were managed during or after the procedure, and many trials incorporated post-CS surveillance. In FS trials, patients with high-risk findings (e.g., large polyps, multiple adenomas, or high-grade dysplasia)—as well as those with positive FIT or gFOBT results—were referred for CS and managed similarly.
Data extraction
Two reviewers (M.K. and A.S.) independently screened titles/abstracts and assessed full texts for eligibility, resolving discrepancies through discussion. Data extraction was performed independently, including study title, first author, publication year, country, study design, follow-up duration, age, sample size, screening modality, and outcome definitions (CRC incidence, CRC mortality, and all-cause mortality), as well as the number of events for each binary outcome. We also extracted data on screening uptake, adherence across repeated screening rounds, follow-up CS after positive screening results, and procedure-related complications when reported. For total follow-up time, reported person-years were used whenever available. When person-years were not provided, they were estimated from the number of events and incidence or mortality rates. If neither rates nor person-years were reported, we contacted study authors for additional information. For studies including PP analyses but not PP person-years, we estimated person-years by proportionally scaling those from the ITT population.
Outcome definitions and assessment
CRC diagnosis was ascertained through International Classification of Diseases codes, national or regional cancer registries, pathology reports, and clinical records. Most trials used automated record linkage and independent expert review to ensure diagnostic accuracy. CRC incidence was defined as newly diagnosed cases during follow-up. CRC-related deaths were identified through national death registries, medical records, or questionnaires, and in several trials the underlying cause of death was adjudicated by blinded committees.
Definition of screening interventions and follow-up duration
CS is recommended every 10 years for average-risk populations; however, few trials reported outcomes at exactly 10 years. Therefore, the primary analysis included trials across all screening strategies with follow-up durations of 10–13 years. During this period, CS and FS were generally performed once, although one FS trial included two rounds. In the primary analysis, all FS trials were grouped within a single FS node, including the trial with two screening rounds; a sensitivity analysis excluding this trial was performed to assess the impact of differences in FS screening frequency. FIT and gFOBT were administered biennially. One trial reported both annual and biennial gFOBT; however, to maintain consistency with the screening intervals used in all other trials, only the biennial arm was included in the analysis. In the primary analysis, FIT and gFOBT were modeled as separate nodes to preserve clinical distinctions between stool-based screening strategies. FIT and gFOBT were combined into a single node (“FIT/gFOBT”) only in prespecified sensitivity analyses to examine the robustness of the findings under an alternative stool-test grouping and to improve network connectivity where needed. In addition, they were combined in the exploratory PP analysis of all-cause mortality, in which separate modeling produced a disconnected network that precluded estimation of all comparisons. The handling of FIT and gFOBT nodes across analyses is summarized in Supplemental Table 2. Secondary analyses used trials with follow-up durations of 13–18 and 18–22 years to investigate outcomes at 15 and 20 years, respectively.
Risk of bias and certainty of evidence assessment
Risk of bias was assessed per outcome using Cochrane Risk of Bias 2.0. 16 Certainty of evidence was evaluated following the Confidence in Network Meta-Analysis-based GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach. 17 Both assessments were conducted independently by two reviewers (M.K. and A.S.).
Data synthesis and statistical analysis
NMAs were performed using a frequentist framework in Stata (version 16.1, network package; StataCorp LLC, College Station, TX, USA). In addition to the primary ITT analysis, an exploratory PP analysis was conducted. In this analysis, participants in the intervention group who actually underwent the assigned screening examination were included, and outcome events were extracted among screened individuals whenever available. This analysis was intended to explore the potential effect among individuals who underwent screening; however, because adherence was not randomized, these estimates are susceptible to selection bias and were interpreted as hypothesis-generating. Separate NMAs were performed for ITT and PP populations using the corresponding event counts and numbers of participants. A consistency model with random effects was adopted as the primary analytic approach, assuming transitivity across trials. To assess the robustness of the results, a fixed-effects model was additionally applied as a sensitivity analysis. Heterogeneity was quantified using τ2 and I2. I2 values of approximately 25%, 50%, and 75% were considered to indicate low, moderate, and high heterogeneity, respectively. 18 Global and local inconsistency were intended to be evaluated using the design-by-treatment interaction model and node-splitting; however, the absence of closed loops precluded formal assessment. Publication bias was explored with comparison-adjusted funnel plots, Egger’s regression test, and Begg’s rank correlation test. Risk ratios (RRs) with 95% confidence intervals (CIs) were used as the common effect measure in both pairwise meta-analyses and NMAs to ensure compatibility and methodological consistency across trials. Pairwise meta-analyses were performed for all available direct comparisons using random-effects models, providing direct estimates to complement the network analyses. Additionally, treatment rankings were based on the surface under the cumulative ranking curve (SUCRA) values. SUCRA values represent the probability that each screening strategy is the best among all options, ranging from 0% (least favorable) to 100% (most favorable). Results were presented using forest plots and network league tables to summarize all pairwise comparisons and pooled effect estimates.
Sensitivity analyses included combining FIT and gFOBT into a single node (FIT/gFOBT), excluding the trial that enrolled participants aged ⩾75 years, for whom screening decisions are typically individualized, assessing the impact of differences in the number of FS screening rounds, and reanalyzing the data using a fixed-effects model. Because follow-up duration varied across trials (approximately 10–13 years), we also performed an additional sensitivity analysis using person-years as the denominator and person-year-based rate ratios as the effect measure. This analysis was undertaken to evaluate whether differences in follow-up time materially affected the findings of the primary analyses based on cumulative event counts. In a further exploratory sensitivity analysis, we conducted an ITT network meta-regression including trial-level screening uptake (defined as completion of at least one invited screening round) as a covariate. Uptake was summarized as the mean across active screening arms (excluding NS), mean-centered, and scaled per 10% increase. We reported exp(β), which represents the multiplicative change in the RR per 10% higher uptake.
For the ITT analyses, we calculated absolute risk differences (ARRs per 1000) and the number needed to screen (NNS) using event rates in the NS group and pooled RRs from the NMA.
Results
A total of 2374 records were identified, of which 9 studies (23 publications)12,13,19 –37 met eligibility criteria and were included in the final NMA (Figure 1). Characteristics of the included trials are summarized in Table 1; Supplemental Tables 3 and 4. In the primary analysis with approximately 10 years of follow-up, the ITT population comprised 54,552 participants in the CS group, 161,963 in the FS group (including 77,445 screened twice), 26,719 in the FIT group, 122,778 in the gFOBT group, and 464,584 in the NS group.

PRISMA flow diagram of study selection.
Characteristics of randomized controlled trials of CRC screening stratified by follow-up duration in years.
Proportion completing ⩾1 screening.
Proportion completing ⩾80% of screenings.
FS performed as initial screening; repeat at 3 or 5 years.
Proportion completing both rounds.
Proportion completing 100% of screenings.
CRC, colorectal cancer; CS, colonoscopy; FIT, fecal immunochemical test; FS, flexible sigmoidoscopy; gFOBT, guaiac-based fecal occult blood test; IQR, interquartile range; NS, no screening.
The screening network structure is shown in Figure 2, with NS used as the common comparator. Direct comparisons between screening strategies were limited, with only the COLONPREV trial comparing CS and FIT. 13 Risk of bias assessment is presented in Supplemental Figure 1, and detailed GRADE ratings are summarized in Supplemental Tables 5 and 6. Overall, all studies were judged to have some concerns in ITT analyses and to be at high risk of bias in PP analyses. In the primary analysis, certainty of evidence was generally low for direct comparisons versus NS and very low for indirect comparisons between screening strategies in ITT analyses, whereas it was very low throughout PP analyses. In ITT analyses, downgrading mainly reflected network limitations, particularly the absence of closed loops, conservative concerns regarding incoherence, and the limited number of FIT trials. In PP analyses, certainty was further lowered by additional risk of bias because estimates were based on actual screening uptake rather than randomized assignment, making them susceptible to selection bias. Accordingly, PP findings should be interpreted as exploratory and hypothesis-generating. Between-study heterogeneity was minimal across outcomes, with τ2 values close to zero in nearly all analyses and the largest value observed in the PP analysis for CRC incidence (τ2 = 0.003). However, formal assessment of inconsistency was not feasible because of the sparse network structure and lack of closed loops; therefore, comparisons based largely on indirect evidence should be interpreted cautiously. Assessment of publication bias was limited by the small number of included studies and the sparse network structure. Although comparison-adjusted funnel plots showed no marked asymmetry and Egger’s and Begg’s tests were non-significant across outcomes (p > 0.05; Supplemental Figures 2 and 3), these assessments were underpowered and therefore not highly informative.

Network structure of randomized controlled trials comparing screening strategies at 10-year follow-up. (a) CRC incidence; ITT analysis. (b) CRC mortality; ITT analysis. (c) All-cause mortality; ITT analysis. (d) CRC incidence; exploratory PP analysis. (e) CRC mortality; exploratory PP analysis. (f) All-cause mortality; exploratory PP analysis. Each node represented a screening strategy and each edge represented a direct comparison; node size and edge thickness reflected the number of contributing studies and direct comparisons, respectively. Most evidence was anchored to NS as the common comparator; only one trial (COLONPREV) provided a direct head-to-head comparison (CS vs FIT), and all other between-strategy comparisons were therefore indirect.
All included RCTs were conducted in Western countries and primarily enrolled average-risk, asymptomatic adults aged 45–75 years. Despite these broad similarities, the transitivity assumption should be interpreted cautiously, as screening strategies differed across trials in uptake, adherence, repeated screening structure, follow-up CS rates, and downstream surveillance. Uptake was lowest for CS (e.g., 42.0% in NordICC; 31.8% in COLONPREV),12,13 whereas it was moderate to high for FS (65.1%–86.6%) and gFOBT (59.6%–89.9%). FIT showed a lower uptake (39.9%; Table 1). Adherence across repeated screening rounds also varied across stool-based screening trials (Supplemental Table 7). In the FIT-based COLONPREV trial, participation declined across repeated rounds. In the gFOBT trial with round-specific data, participation remained relatively high in each round, although the proportion of participants completing repeated screening declined over time. After positive screening results, follow-up CS was generally high across strategies, indicating that diagnostic work-up after positive results was implemented (Table 2). Among participants with a positive test result, the proportion undergoing follow-up CS ranged from approximately 80% to 96% after positive FS, 95% after positive FIT, and 73% to 84% after positive gFOBT; in certain gFOBT trials, some of the remaining participants underwent alternative diagnostic evaluation (e.g., barium enema). Among screened participants, the proportion undergoing follow-up CS was 5.0%–26.3% for FS, 16.0% for FIT, and 3.2%–4.0% for gFOBT. Downstream surveillance pathways also varied across trials, including differences in eligibility criteria, surveillance intervals, and applied guideline frameworks, which may have influenced long-term outcomes (Supplemental Table 3). Finally, procedure-related complications (perforation and major bleeding) were uncommon and appeared to be reported more frequently after CS (primary or follow-up) than after screening FS (Table 2).
Follow-up CS and procedure-related complications across screening trials.
“CS (follow-up)” indicates downstream diagnostic CS performed after a positive FS, FIT, or gFOBT result, not primary screening CS.
FIT-positive count derived from the as-screened population (n = 13,599), including crossover participants; does not represent the per-protocol FIT group alone.
Denominators derived from separate complication reports; differ from the primary screening population.
Events per procedure.
Complications for FS and follow-up CS were not separately extractable for the FS-only arm (reported for the combined cohort of FS-only plus FS + gFOBT).
Combined annual and biennial arms.
CS, colonoscopy; FIT, fecal immunochemical test; FS, flexible sigmoidoscopy; gFOBT, guaiac-based fecal occult blood test; NA, not applicable; NR, not reported.
NMA results
ITT analysis
In the ITT analysis, both CS and FS were associated with significant reductions in CRC incidence compared with NS (CS: RR 0.83, 95% CI 0.72–0.96; FS: RR 0.78, 95% CI 0.74–0.83; Figure 3(a); Supplemental Table 8(A)). FIT and gFOBT did not show significant reductions compared with NS (FIT: RR 0.90, 95% CI 0.73–1.12; gFOBT: RR 0.97, 95% CI 0.92–1.03). Because long-term randomized evidence for FIT was limited to a single trial directly comparing FIT with CS rather than NS, most comparative estimates involving FIT were indirect; thus, the absence of statistically significant findings for FIT should be interpreted as insufficient evidence rather than evidence of no effect. Compared with gFOBT, FS was associated with a lower estimated risk (RR 0.81, 95% CI 0.74–0.87).

Network forest plots comparing RRs for screening strategies at 10-year follow-up (ITT analysis). (a) CRC incidence. (b) CRC mortality. (c) All-cause mortality. Estimates to the left of the vertical line (RR <1) indicate lower risk for the first-listed strategy when the 95% CI does not cross 1. Certainty of evidence was generally low for direct comparisons versus NS and very low for indirect comparisons between active screening strategies; statistically significant findings for indirect comparisons should therefore be interpreted cautiously.
For CRC mortality, FS (RR 0.74, 95% CI 0.67–0.83) and gFOBT (RR 0.87, 95% CI 0.79–0.95) showed significant reductions compared with NS, whereas CS and FIT did not (CS: RR 0.92, 95% CI 0.69–1.21; FIT: RR 0.98, 95% CI 0.62–1.56). FS was also associated with a lower estimated risk than gFOBT (RR 0.86, 95% CI 0.75–0.98; Figure 3(b); Supplemental Table 8(A)). No intervention demonstrated a significant reduction in all-cause mortality compared with NS or other screening strategies (Figure 3(c); Supplemental Table 8(A)). Overall, comparisons between screening strategies were based largely on indirect evidence, with many estimates supported by low- or very-low-certainty evidence, and should therefore be interpreted cautiously.
Absolute effects for the ITT analyses are provided in Supplemental Table 9. For CRC incidence, ARR was larger for FS and CS than for FIT or gFOBT (e.g., 3.37 and 2.62 fewer cases per 1000 invited, respectively). For mortality outcomes, however, absolute effects were generally smaller and often difficult to interpret when the corresponding RR CIs included 1.0. In particular, for all-cause mortality, the estimated absolute differences were small across strategies, making the corresponding ARR and NNS values difficult to interpret.
Exploratory PP analysis
In this exploratory PP analysis reflecting a non-randomized within-trial comparison among screening completers, CS and FS both significantly reduced CRC incidence compared with NS (CS: RR 0.78, 95% CI 0.62–0.99; FS: RR 0.72, 95% CI 0.66–0.79), whereas FIT and gFOBT did not (FIT: RR 1.17, 95% CI 0.78–1.76; gFOBT: RR 0.90, 95% CI 0.81–1.01; Figure 4(a); Supplemental Table 8(A)). CS and FS were each associated with a lower estimated risk than FIT (CS: RR 0.67, 95% CI 0.48–0.93; FS: RR 0.62, 95% CI 0.41–0.94), and FS was also associated with a lower estimated risk than gFOBT (RR 0.80, 95% CI 0.69–0.92).

Network forest plots comparing RRs for screening strategies at 10-year follow-up (exploratory PP analysis). (a) CRC incidence. (b) CRC mortality. (c) All-cause mortality. Estimates to the left of the vertical line (RR <1) indicate lower risk for the first-listed strategy when the 95% CI does not cross 1. All panels showed exploratory PP estimates at high risk of bias and very low certainty of evidence; these findings should therefore be regarded as hypothesis-generating only. The all-cause mortality findings in panel (c) were particularly susceptible to selection bias and should not be interpreted causally.
For CRC mortality, CS, FS, and gFOBT each showed significant reductions compared with NS (CS: RR 0.52, 95% CI 0.31–0.85; FS: RR 0.58, 95% CI 0.49–0.70; gFOBT: RR 0.71, 95% CI 0.61–0.82). CS was also associated with a lower estimated risk than FIT (RR 0.21, 95% CI 0.05–0.90; Figure 4(b); Supplemental Table 8(A)). These PP findings should be interpreted cautiously and regarded as hypothesis-generating only. For all-cause mortality, FIT and gFOBT were combined into a single node (FIT/gFOBT) because the sparse evidence base led to a disconnected network when these strategies were modeled separately. All-cause mortality was significantly reduced in all groups compared with NS (CS: RR 0.66, 95% CI 0.58–0.75; FS: RR 0.82, 95% CI 0.79–0.84; FIT/gFOBT: RR 0.84, 95% CI 0.82–0.85). CS was associated with a lower estimated risk than both FS and FIT/gFOBT (CS vs FS: RR 0.81, 95% CI 0.71–0.93; CS vs FIT/gFOBT: RR 0.79, 95% CI 0.70–0.90; Figure 4(c); Supplemental Table 8(B)). The all-cause mortality findings, in particular, are susceptible to selection bias and should not be interpreted causally. Therefore, they should be regarded as hypothesis-generating.
SUCRA rankings
SUCRA-based rankings were explored as a descriptive summary and should be interpreted cautiously, particularly in this sparse network where ranking estimates may appear more precise than warranted, rather than as a definitive hierarchy of screening strategies (Figure 5; Supplemental Tables 10 and 11). Caution is particularly warranted for rankings involving FIT, for which long-term randomized evidence was limited and most comparative estimates were indirect. Rankings from the PP analyses should also be interpreted carefully, as these analyses were at high risk of bias and very low certainty, and should be regarded as hypothesis-generating.

SUCRA ranking of screening strategies at 10-year follow-up. (a) CRC incidence; ITT analysis. (b) CRC mortality; ITT analysis. (c) All-cause mortality; ITT analysis. (d) CRC incidence; exploratory PP analysis. (e) CRC mortality; exploratory PP analysis. (f) All-cause mortality; exploratory PP analysis. Rankings were exploratory and based on SUCRA values; they should not be interpreted as a definitive hierarchy of screening strategies. Rankings involving FIT were particularly uncertain, as evidence was limited to a single trial with mostly indirect comparisons. Rankings from PP analyses (panels d–f) were at high risk of bias and should be regarded as hypothesis-generating only.
For CRC incidence, FS ranked highest in both ITT and PP analyses, followed by CS. Among the lower-ranked screening strategies, the ranking order was FIT, gFOBT, and NS in ITT, and gFOBT, NS, and FIT in PP (Figure 5(a) and (d)). For CRC mortality, FS ranked highest in the ITT analysis, followed by gFOBT, CS, FIT, and NS (Figure 5(b)). By contrast, in the PP analysis, CS ranked highest, followed by FS, gFOBT, NS, and FIT (Figure 5(e)). For all-cause mortality, FS ranked highest in the ITT analysis, followed by CS, gFOBT, NS, and FIT (Figure 5(c)). By contrast, in the PP analysis, CS ranked highest, followed by FS, FIT/gFOBT, and NS (Figure 5(f)).
Sensitivity analyses
Sensitivity analyses excluding the trial with double FS screening 22 or the trial enrolling participants aged ⩾75 years 24 yielded results consistent with the primary analyses, with no material change in the overall interpretation (Supplemental Tables 12 and 13). Findings were also stable across random-effects and fixed-effects models (Supplemental Table 14). In additional sensitivity analyses combining FIT and gFOBT into a single node (FIT/gFOBT), the pooled RRs and SUCRA rankings were broadly unchanged compared with the primary analysis. Certainty of evidence was higher under the combined-node approach, with many ITT comparisons rated as moderate, supporting the robustness of the overall findings (Supplemental Figures 4–7; Supplemental Tables 15–17).
To assess the potential impact of variation in follow-up duration across trials, we also performed sensitivity analyses using person-years as the denominator and person-year-based rate ratios. Overall, the findings were broadly consistent with those of the primary participant-based RR analyses, with only limited changes in statistical significance and ranking. Most comparisons showed similar effect estimates and stable ranking patterns. The main exception was the comparison of CS versus NS for CRC incidence. Although CS was associated with a significant reduction in the primary participant-based analysis, this association was no longer statistically significant in the person-year-based sensitivity analysis (Table 3; Supplemental Tables 18 and 19).
Primary participant-based RRs and person-year-based rate ratios for screening strategies versus NS (ITT analysis).
Bold indicates statistical significance. “Lost” indicates that the association was statistically significant in the participant-based analysis but not in the person-year-based analysis.
CI, confidence interval; CRC, colorectal cancer; CS, colonoscopy; FIT, fecal immunochemical test; FS, flexible sigmoidoscopy; gFOBT, guaiac-based fecal occult blood test; ITT, intention-to-treat; NS, no screening; RR, risk ratio.
Network meta-regression for uptake
In an exploratory network meta-regression using trial-level uptake (per 10% increase), there was no clear evidence that uptake modified the relative effects of FS or gFOBT for CRC incidence, CRC mortality, or all-cause mortality (Supplemental Table 20). Estimates for FIT were not estimable because of sparse information, and estimates for CS were highly unstable with very wide CIs, reflecting limited data and uptake variability across comparisons.
Pairwise meta-analyses
Pairwise meta-analysis results are provided in Supplemental Figures 8–13. The results of pairwise meta-analyses were consistent with those of the NMA.
Time-stratified effectiveness of FS and gFOBT
In the time-stratified ITT analysis, long-term effects beyond 10 years were assessed (Figure 6(a)–(c)). Fifteen- and 20-year data were available only for FS, gFOBT, and NS; no long-term data were available for CS or FIT; the PP analysis was not feasible owing to insufficient data. The 15-year follow-up included studies evaluating CRC incidence (FS, 4 studies; gFOBT, 1 study), CRC mortality (FS, 3 studies; gFOBT, 1 study), and all-cause mortality (FS, 1 study; gFOBT, 1 study). The 20-year follow-up included studies evaluating CRC incidence (FS, 1 study; gFOBT, 2 studies), CRC mortality (FS, 2 studies; gFOBT, 2 studies), and all-cause mortality (FS, 2 studies; gFOBT, 2 studies).

Time-stratified effectiveness of screening strategies. (a) CRC incidence; ITT analysis. (b) CRC mortality; ITT analysis. (c) All-cause mortality; ITT analysis. The 15- and 20-year analyses were conducted using FS, gFOBT, and NS data only.
For CRC incidence, FS demonstrated sustained benefit versus NS at both 15 and 20 years; gFOBT showed no reduction in incidence at either time point. For CRC mortality, FS reduced the risk versus NS at 15 years, with persistence at 20 years; gFOBT showed no benefit at 15 years but a reduction at 20 years. For all-cause mortality, neither FS nor gFOBT demonstrated clear long-term benefit.
Discussion
In this NMA, we compared CS, FS, FIT, and gFOBT within a unified framework, using ITT analyses to assess invitation-based population effectiveness and PP analyses as a complementary exploratory approach to contextualize outcomes among screening completers.
First, in ITT analyses, CS reduced CRC incidence but did not yield a significant reduction in CRC mortality or all-cause mortality compared with NS. This modest population-level impact may reflect, at least in part, low screening uptake in the included trials (31.8%–42.0%). Contemporary population-based surveys suggest higher real-world uptake in some settings (e.g., in the United States, 70.4% of adults were up to date with CRC screening in 2020, with CS accounting for 64.5% of tests). 38 However, uptake remains heterogeneous across countries and strategies, ranging from approximately 50% to 60% in some European settings to as low as approximately 20% in others. 4 Although trial-based invitation rates and real-world uptake figures are not directly comparable, the ITT findings nonetheless suggest that population-level effectiveness may be attenuated when screening uptake is low.
Importantly, the reduction in CRC incidence for CS versus NS observed in the primary participant-based analysis was no longer statistically significant in the person-year-based sensitivity analysis. Although this may partly reflect additional uncertainty because person-years were not directly reported in some trials and had to be estimated, the result indicates that the estimated effect of CS on CRC incidence is sensitive to how follow-up duration is handled across studies. Accordingly, the ITT finding for CRC incidence with CS should be interpreted cautiously, and further evidence based on directly reported person-year data is needed.
The exploratory PP analyses suggested larger reductions in CRC incidence, CRC mortality, and all-cause mortality for CS; however, these findings must be interpreted with considerable caution. PP analyses represent non-randomized within-trial comparisons restricted to screening completers and are therefore highly susceptible to selection bias, including the “healthy screenee” effect. The apparently strong all-cause mortality benefit of CS in the PP analysis is particularly difficult to interpret causally and should not be taken as evidence of a true mortality benefit. Accordingly, all PP findings should be regarded as hypothesis-generating only and should not be used to inform clinical recommendations or policy.
Second, FS warrants renewed attention. Although several guidelines have de-emphasized its use as a primary screening option, FS reduced CRC incidence and mortality in the ITT analyses, and trials with 15–20 years of follow-up demonstrated durable reductions in these outcomes. FS is commonly implemented as part of a stepwise screening strategy, with individuals having positive findings referred for CS. Distal adenomas ⩾10 mm, with villous histology, or with high-grade dysplasia are strongly associated with synchronous proximal advanced neoplasia and long-term proximal CRC risk, and thus can guide selective referral for CS.39,40 However, nearly half of proximal advanced neoplasias occur without distal lesions, and even when applying high-risk criteria, up to 38% of all proximal advanced neoplasias—including 17% of proximal cancers—may go undetected.39,41,42 Importantly, FS carries substantially lower complication risks than CS; rates of bleeding and perforation are markedly lower for FS (e.g., 8 vs 68–698 per 100,000 for bleeding and 0.88 vs 1.96 per 1000 for perforation).43,44 Taken together, FS may remain a pragmatic and safer option in settings where CS capacity is constrained, and may be particularly attractive for older adults or resource-limited programs aiming to expand participation while minimizing harms.
Third, the preventive profiles of CS and stool-based screening strategies differ in ways that may matter for program design. CS can prevent CRC by detecting and removing precancerous lesions (a strong primary prevention component), whereas stool-based screening strategies primarily aim to detect bleeding from advanced adenomas or early cancers and thereby reduce mortality through earlier detection. In our ITT analysis, gFOBT demonstrated a significant reduction in CRC mortality, indicating that gFOBT can achieve meaningful population-level effectiveness. However, the effectiveness of stool-based screening strategies reflects not only the intrinsic performance of the test itself, but also programmatic factors such as timely colonoscopic follow-up after positive tests and sustained participation over time. Because stool-based screening programs are delivered over repeated rounds, declining adherence may reduce cumulative program effectiveness by lowering cumulative screening exposure. 45 A similar pattern was reflected in our data: although participation within individual rounds remained relatively high in some trials, cumulative completion across repeated rounds declined over time. These findings highlight that maintaining long-term adherence across screening rounds remains an important challenge for stool-based screening programs.
This NMA has several limitations. First, the primary network was sparse and lacked closed loops, precluding formal assessment of inconsistency and thereby limiting confidence in indirect comparisons. In addition, adherence patterns and exposure to repeated screening rounds differed across trials, particularly for stool-based strategies. Furthermore, one FS trial used two screening rounds, whereas the others used a single round. Although sensitivity analyses suggested that the additional FS screening round had little impact on the overall conclusions, some observed differences between screening strategies may reflect differences in how the programs were implemented, rather than differences in the tests alone. Indirect comparisons should therefore be interpreted with caution. Second, relatively few trials reported PP outcomes, limiting precision. Moreover, PP analyses are inherently susceptible to selection bias, including the “healthy screenee” effect, because participants who undergo screening may differ systematically from those who do not adhere, potentially exaggerating the apparent benefit of screening. Accordingly, PP-based evidence was judged to be at high risk of bias and of very low certainty. Alternative causal inference approaches, such as complier average causal effect analyses or instrumental variable analyses, may better address non-adherence in randomized screening trials; however, these methods generally require individual participant data or trial-specific modeling assumptions and were therefore not feasible within this study-level NMA. Third, evidence for FIT was limited to a single RCT, which compared FIT with CS rather than with NS. As a result, the network provided limited direct evidence for FIT relative to other strategies. Therefore, caution is warranted when extrapolating these findings to contemporary FIT-based screening programs, and the performance of modern FIT-based screening programs should not be inferred primarily from gFOBT-based evidence. Additional randomized evidence will be important to clarify the long-term comparative effectiveness of FIT.
Fourth, follow-up beyond 10 years was available only for FS and gFOBT, with no comparable long-term data for CS or FIT. Thus, the 15–20-year findings should be interpreted only as long-term evidence for FS and gFOBT, and not as rankings or comparative evidence across all screening strategies. Fifth, all included trials were conducted in Western countries among average-risk adults; thus, generalizability to regions with differing health systems, resource availability, or screening culture may be limited. In addition, implementation in low-resource or non-Western settings requires consideration of the capacity to provide follow-up CS after a positive screening test. In stepwise strategies based on FS, FIT, or gFOBT, positive results require diagnostic CS; however, the demand for follow-up CS is not uniform, because both test positivity and completion of follow-up CS vary across strategies and settings. Although these strategies reduce primary CS use among test-negative individuals, real-world effectiveness may be attenuated without timely follow-up CS. Therefore, local endoscopic capacity, referral pathways, and resource constraints should be considered when translating these findings to screening programs outside Western settings. Finally, although restricting inclusion to RCTs maximized internal validity, these randomized trials may not fully capture the complexity of real-world adherence and implementation over time. Beyond comparative effectiveness, several contextual factors—including cost-effectiveness, false-positive rates, potential harms, and endoscopic capacity—must also inform optimal screening strategies. The central challenge is not simply to view screening strategies as forming a definitive hierarchy based on estimated effects but to translate these findings into real-world effectiveness by considering social, behavioral, and structural determinants of screening participation.
Despite these limitations, this study has several important strengths. To our knowledge, this is the first NMA to evaluate CS, FS, FIT, and gFOBT within a unified framework, using ITT analyses to assess invitation-based population effectiveness and PP analyses as a complementary exploratory assessment of outcomes among screening completers. Harmonizing follow-up to approximately 10 years—aligned with the recommended CS interval—allowed more meaningful cross-trial comparisons. Restricting inclusion to RCTs improved internal validity, and multiple sensitivity analyses supported the overall interpretation of the findings. These methodological features enhance the clinical relevance and interpretability of the results.
Conclusion
In this NMA, CS reduced CRC incidence in ITT analyses but did not show a significant reduction in CRC mortality or all-cause mortality, suggesting limited population-level effectiveness when uptake was low. FS reduced CRC incidence and mortality in ITT analyses, and gFOBT reduced CRC mortality, indicating that meaningful population-level benefit may also be achieved through screening strategies that are scalable and acceptable in real-world settings. Future screening programs should adopt context-dependent screening strategies that align with local resources, endoscopic capacity, and population behaviors, thereby improving population-level effectiveness. PP-based findings in this study were derived from exploratory analyses with very low certainty of evidence and high risk of bias, and therefore should not be used in isolation to guide clinical recommendations, guideline development, or policy decisions.
Supplemental Material
sj-docx-1-tag-10.1177_17562848261454186 – Supplemental material for Long-term effectiveness of endoscopic and stool-based colorectal cancer screening strategies: a systematic review and network meta-analysis
Supplemental material, sj-docx-1-tag-10.1177_17562848261454186 for Long-term effectiveness of endoscopic and stool-based colorectal cancer screening strategies: a systematic review and network meta-analysis by Motoki Kaneko and Atsushi Sakuraba in Therapeutic Advances in Gastroenterology
Supplemental Material
sj-docx-2-tag-10.1177_17562848261454186 – Supplemental material for Long-term effectiveness of endoscopic and stool-based colorectal cancer screening strategies: a systematic review and network meta-analysis
Supplemental material, sj-docx-2-tag-10.1177_17562848261454186 for Long-term effectiveness of endoscopic and stool-based colorectal cancer screening strategies: a systematic review and network meta-analysis by Motoki Kaneko and Atsushi Sakuraba in Therapeutic Advances in Gastroenterology
Footnotes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
