Stacked random forest model for colorectal cancer detection using complete blood counts

Abstract

Background

In China, adherence to screening colonoscopy among eligible individuals remains suboptimal, primarily due to cost concerns and potential adverse effects. A machine learning model utilizing complete blood count (CBC) data could help prioritize colonoscopy referrals and improve screening participation.

Method

This multicenter study included participants who underwent CBC testing within three months before colonoscopy. CBC data were classified into three types (A, B, and C) based on hematology analyzer capabilities, with Type C excluded from analysis. Using Types A and B, we developed a stacking machine learning model incorporating 24 CBC features and 5 combined CBC components to predict colorectal cancer (CRC). Model performance was evaluated using the area under the curve (AUC), specificity, and sensitivity.

Results

The study included 1795 CRC cases and 26,380 cancer-free individuals with CBC data. On external validation, the model achieved 80.3% specificity and 65.2% sensitivity. Notably, it demonstrated 41% sensitivity for Stage I CRC and 57.6% sensitivity for Stages I–III combined.

Conclusions

CBC testing, combined with electronic medical record data, is a low-cost and widely accessible tool. Our robust CRC risk prediction model can serve as a preliminary screening method, aiding in colonoscopy referral decisions and improving CRC screening efficiency.

Keywords

Colorectal cancer colonoscopy electronic medical record complete blood count stacking machine learning model

Introduction

Colorectal cancer (CRC) represents a significant public health challenge in China, ranking as the second most prevalent cancer and fourth leading cause of cancer-related mortality.¹ While colonoscopy screening every 5–10 years for individuals ≥45 years old can significantly reduce CRC incidence and mortality,² China's compliance rates remain below 50% due to socioeconomic disparities and limited healthcare access.^3–5 This screening gap underscores the urgent need for cost-effective, non-invasive alternatives. Although emerging technologies like stool DNA tests and cfDNA assays show promise,^6,7 their high cost prohibits widespread adoption. Thus, the development of innovative, cost-effective, and user-friendly screening options is essential to promote greater participation in colonoscopy.

The complete blood count (CBC), known as the routine blood test in China, represents one of the most widely performed clinical laboratory tests nationwide, with an estimated 600 million tests conducted annually. Nearly all adults undergo CBC testing at least yearly for general health assessments, making it an ideal candidate for population-wide colorectal cancer (CRC) screening. Emerging evidence demonstrates the diagnostic and prognostic value of CBC-derived parameters in CRC, including established indices like red blood cell distribution width (RDW), hemoglobin concentration, and platelet count. Furthermore, composite inflammatory markers derived from CBC, particularly the neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR), have demonstrated significant prognostic value in oncology.^8–11 However, the potential diagnostic value of additional CBC-derived composite markers remains underexplored, and the synergistic diagnostic performance achievable through comprehensive integration of multiple CBC features with composite markers has not been systematically evaluated. These limitations highlight the need for more sophisticated analytical approaches to fully exploit the diagnostic potential embedded within routine CBC parameters.

Machine learning (ML) has emerged as a powerful approach for data mining and pattern recognition in biomedical research. However, the generalization capability of data-driven models is frequently compromised by multiple confounding factors. Technical variations, including differences in testing instruments, batch effects, and inter-institutional protocols, along with biological heterogeneity stemming from demographic factors such as ethnicity, gender, and age,¹² collectively limit the clinical utility of CBC data for CRC screening applications. These inherent challenges have significantly constrained the effective implementation of CBC-based machine learning models in diagnostic workflows and clinical practice to date. These factors limit the CBC data formally utilized for CRC screening, aiding in diagnosis, and other clinical practices thus far.

In this study, we developed a robust colorectal cancer (CRC) risk stratification model by leveraging electronic medical record (EMR) data from complete blood count (CBC) tests. Our approach employs an stacked ensemble methodology^13,14 that integrates multiple random forest base learners, thereby enhancing generalization capability across distinct clinical settings. The rest of this paper is organized as follows: section Methods detail the multi-center dataset and stacked model architecture; Section Results present performance comparisons with existing methods; Section Discussion discusses clinical implications and limitations; Final section concludes with future directions.

Method

Subjects

Figure 1 shows the workflow of this study. Participants were collected from eight hospitals mainly in south and east China (Table S1) from 2015 to 2021. Individuals diagnosed as CRC or adenomatous polyps with a diameter ≥0.5 cm through colonoscopy and pathology in China multicenter were included. CBC and demographic datasets were collected from EMR. Individual inclusion criteria: (1) Individuals underwent colonoscopy; (2) Age ≥25 years; (3) Had a CBC records before colonoscopy within 0–6 months; (4) Colonoscopy revealed lesions with polyp diameter ≥0.5 cm; (5) Histological diagnosis confirmed primary colorectal cancer or adenomatous polyps. Individual exclusion criteria: (1) Individuals underwent emergency colonoscopy; (2) History of blood transfusion within 3 weeks prior to the blood routine test; (3) Failed colonoscopy procedure; (4) Pregnant women; (5) Blood routine test conducted more than 3 days after colonoscopy.

Figure 1.

Flow chart of the study design.

Data preparation

Based on the classification capability of the hematology analyzer, CBC data were categorized into three types: A, B, and C, with C type being excluded. Category A: Identified by the presence of nucleated red blood cell count and 26 parameters reported. Category B: Identified by the absence of nucleated red blood cell count but with white blood cells divided into five categories: neutrophils, eosinophils, basophils, monocytes, and lymphocytes, comprising 23 parameters. Category C: Characterized by the categorization of white blood cells into granulocytes, lymphocytes, and monocytes, with 19–21 parameters, often used in smaller clinic.

We used category A data with 713 CRC cases and 4106 cancer free individuals as the training set for the model, category B data with 206 CRC cases and 10,091 cancer free individuals as the tuning set, and category A and B data from different centers as independent external validation sets (Figure 1). Input data included sex, age, 24 CBC items and 5 combined CBC components: neutrophil count/lymphocyte count (NLR), monocyte count/lymphocyte count (MLR), platelet count/lymphocyte count (PLR), neutrophil count × platelet count/lymphocyte count (NPLR), and monocyte count × platelet count/lymphocyte count (MPLR). Missing values were imputed using autoimpute.imputations, MiceImputer in python.

Model development

In this study, the positive CRC samples size is much smaller than the negative samples. We employed under sampling of negative samples to reduce the impact of imbalanced data that may lead to a biased model.¹⁵ Random forest sub-models with different CRC diagnostic sensitivities and specificities were trained and we prioritized high specificity sub-models for CRC diagnosis. The ratio of positive to negative samples during the training of each sub-model is determined by the performance of the model on different category CBC data, genders, and individuals over 50 years old, with the endpoint criteria of specificity ≥80% and sensitivity ≥50%. Five-fold cross-validation was leveraged to optimize the parameters and evaluate each random forest sub-model. Finally, all random forest sub-models are stacked to establish a CRC risk prediction model that combines an anemia feature hematocrit. See Supplementary information for the detailed framework of the stacking model.

We also established random forest model (RF), logistic regression model (Logist), decision tree (DT), gradient boosting decision tree (GBDT), and support vector machine model (SVM) with a 1:1 ratio of positive to negative samples for comparison with the performance of the Stacking model developed in this study. Evaluation metrics included specificity, sensitivity, AUC, and accuracy.

Statistical analysis

Data analyses were conducted using Python version 3.6.8 and R version 4.1.3. Continuous variables with skewed distributions were presented as median with interquartile range and compared using the Mann–Whitney U test or Kruskal–Wallis H test. Categorical variables were presented as numbers with percentages and compared using the Chi-square test or Fisher's exact test. The AUCs were used to evaluate the predictive power, and the optimal cutoff value was established by maximizing the Youden index (sensitivity + specificity - 1). A two-tailed P value < 0.05 was considered statistically significant.

Results

CBC profiles differ substantially between category A and B

A total of 1795 CRC cases and 26,380 cancer free individuals were included. The training set consisted of 713 CRC cases and 4106 cancer-free individuals, and 206 CRC cases and 10,091 cancer-free individuals with category B data were assigned as tuning set. External validation set comprised 2977 category A data samples and 10,082 category B data samples (Figure 1, Table 1). Comparison between CRC and cancer-free groups in the training set showed that all 31 features showed significant statistical differences (P < 0.05). However, in the tuning set, there were no differences between CRC and cancer-free groups in white blood cell count, eosinophil count and percentage, mean corpuscular volume, and platelet mean volume. Additionally, all features of the cancer-free group in the training set (category A) showed significant statistical differences compared to the cancer-free group in the tuning set (category B, P < 0.05).

Table 1.

Characteristics of the training, tuning and validation datasets.

	Training		Tuning		Category A Validation		Category B Validation
	Cancer free*#	CRC*	Cancer free#	CRC	Cancer free	CRC	Cancer free	CRC
n	4106	713	10,091	206	2352	625	9831	251
Age	54 (46–63)	62 (50–72)	52 (41–61)	66 (56–72)	50 (39–60)	62 (51–70)	51 (39–63)	66 (58–74)
Male (%)	2713 (66.1)	441 (61.9)	5426 (53.8)	135 (65.5)	1330 (56.6)	370 (59.2)	5310 (54.0)	154 (61.4)
MPLR	49.0 (36.9–64.8)	74.8 (50.9–133.6)	40 (30.2–53.2)	53.0 (39.6–74.9)	46.8 (35.6–62.9)	79 (50.1–136.1)	50.0 (38.2–65.8)	57.0 (38.7–86.9)
NPLR	383.4 (275.5–547.6)	621.4 (369.3–1260.1)	399.3 (292.6–552.2)	512.9 (373.0–712.9)	438.2 (316.4–634.5)	828.7 (503.7–1622.8)	413.5 (301.6–572.7)	534.3 (375.2–893.0)
MLR	0.22 (0.18–0.28)	0.31 (0.23–0.48)	0.19 (0.15–0.24)	0.23 (0.19–0.32)	0.19 (0.15–0.25)	0.29 (0.21–0.45)	0.23 (0.18–0.29)	0.25 (0.19–0.36)
NLR	1.7 (1.3–2.3)	2.6 (1.7–4.7)	1.9 (1.5–2.5)	2.2 (1.8–3.2)	1.8 (1.4–2.4)	3.0 (2.1–5.1)	1.9 (1.5–2.5)	2.4 (1.7–3.5)
PLR	117.3 (92.1–149.4)	155 (111.7–237.2)	116.0 (92.7–145.8)	145.7 (107.3–189.3)	117.8 (91.9–150)	174.8 (121.4–242.2)	117.6 (93.9–147.0)	134.2 (95.0–180.9)
White blood cell count, 10⁹/L	6 (5–7.1)	6.7 (5.3–8.7)	5.9 (5–7)	6.05 (5–7.3)	6.54 (5.45–7.8525)	7.46 (6.12–9.4)	6.1 (5.15–7.26)	6.65 (5.515–7.915)
Neutrophil percentage, %	56.6 (50.5–63)	63.7 (55.4–75)	59.7 (54.1–65.5)	62.15 (57.175–69.475)	58.7 (52.7–64.9)	68.3 (60.3–77.1)	58.9 (53.1–65)	63.5 (56.8–70.9)
Lymphocyte percentage, %	32.6 (26.8–38.2)	25.1 (15.9–32.3)	31.5 (26.3–36.8)	27.7 (21.7–32.375)	32.1 (26.5–37.9)	22.4 (14.9–29.3)	31.1 (25.6–36.3)	26.5 (20.45–33.1)
Monocyte percentage, %	7.1 (5.9–8.4)	7.5 (6–9)	5.9 (5–7)	6.4 (5.4–7.6)	6.1 (5.1–7.2)	6.4 (5–7.9)	7 (5.9–8.2)	6.6 (5.3–8.3)
Eosinophil percentage, %	2.1 (1.3–3.5)	1.9 (0.7–3.4)	1.6 (1–2.7)	1.9 (1.1–3)	1.9 (1.1–3)	1.6 (0.6–3.1)	1.5 (0.9–2.6)	1.7 (0.8–2.7)
Basophil percentage, %	0.6 (0.4–0.8)	0.4 (0.2–0.6)	0.4 (0.3–0.6)	0.4 (0.3–0.6)	0.4 (0.3–0.6)	0.3 (0.2–0.5)	0.6 (0.4–0.8)	0.4 (0.3–0.675)
Neutrophil number, 10⁹/L	3.32 (2.66–4.18)	4.16 (2.96–6.38)	3.48 (2.79–4.32)	3.66 (2.87–4.705)	3.8 (3–4.8)	5 (3.6–6.9)	3.57 (2.85–4.47)	4.11 (3.21–5.455)
Lymphocyte number, 10⁹/L	1.88 (1.51–2.31)	1.52 (1.07–1.99)	1.82 (1.46–2.23)	1.59 (1.305–1.9875)	2.1 (1.6–2.5)	1.6 (1.2–2.09)	1.85 (1.5–2.26)	1.71 (1.31–2.17)
Monocyte number, 10⁹/L	0.42 (0.34–0.52)	0.49 (0.38–0.65)	0.35 (0.28–0.43)	0.39 (0.32–0.4775)	0.4 (0.3–0.5)	0.5 (0.36–0.61)	0.43 (0.34–0.53)	0.45 (0.33–0.565)
Eosinophil number, 10⁹/L	0.13 (0.07–0.21)	0.13 (0.05–0.21)	0.1 (0.06–0.16)	0.11 (0.07–0.18)	0.12 (0.07–0.21)	0.11 (0.05–0.21)	0.09 (0.05–0.16)	0.1 (0.06–0.17)
Basophil number, 10⁹/L	0.03 (0.02–0.05)	0.03 (0.02–0.04)	0.02 (0.02–0.03)	0.02 (0.02–0.03)	0.03 (0.02–0.04)	0.02 (0.01–0.03)	0.03 (0.02–0.05)	0.03 (0.02–0.04)
Red blood cell count, 10¹²/L	4.66 (4.28–5.03)	4.32 (3.89–4.73)	4.6 (4.27–4.96)	4.36 (4.005–4.71)	4.66 (4.34–5.03)	4.25 (3.81–4.68)	4.67 (4.31–5.06)	4.37 (3.96–4.79)
Hemoglobin concentration, g/L	141 (129–152)	125 (105–139)	142 (131–154)	131.5 (116–145.75)	142 (131–153)	120 (99–135)	142 (131–154)	133 (113–142)
Hematocrit, %	42.2 (38.9–45.1)	38 (32.9–42)	42.4 (39.4–45.6)	39.8 (34.925–43.4)	42 (38.8–45)	36.7 (31.4–40.6)	41.7 (38.6–44.8)	39.5 (34.95–42.45)
Mean corpuscular volume, fL	91 (88.1–93.6)	89 (83.4–92.3)	92.3 (89.4–95.1)	92.25 (87.625–95.25)	90 (87.1–92.7)	87 (79.4–90.9)	89.3 (86.7–91.8)	89 (84.5–92.2)
Mean corpuscular hemoglobin, pg	30.5 (29.4–31.5)	29.4 (26.8–30.8)	31 (30–32)	30.6 (29–31.9)	30.6 (29.5–31.6)	28.8 (25.1–30.3)	30.5 (29.6–31.5)	29.8 (28.125–31.3)
RBC volume distribution width (RDW-CV), %	12.5 (12.1–13.1)	13 (12.4–14.8)	12.7 (12.3–13.1)	12.9 (12.4–13.6)	12.8 (12.3–13.3)	13.4 (12.7–15.6)	12.4 (12–12.9)	13 (12.4–14.1)
Mean corpuscular hemoglobin concentration, g/L	334 (327–341)	327 (316–336)	335 (329–341)	331 (323–337)	339 (333–345)	327 (313–337)	341 (334–348)	335 (326–342)
RBC volume distribution width (RDW-SD)	41.6 (39.8–43.7)	42.5 (40–45.9)	/	/	41.6 (39.8–43.4)	42.8 (40.3–46.4)	/	/
Platelet count, 10⁹/L	223 (187.25–261)	233 (187–297)	213 (179–250)	227.5 (182.25–279.75)	239 (203.75–280)	265 (212–342)	219 (183–257)	234 (184–271.5)
Platelet volume distribution width, %	11.7 (10.6–13.1)	11.5 (10.2–12.8)	16.2 (15.8–16.5)	16.1 (15.7–16.4)	15.3 (11.5–16.1)	11.9 (10.1–15)	11.7 (10.5–13.4)	12 (10.85–12.4)
Mean platelet volume, fL	10.3 (9.7–10.9)	10.1 (9.6–10.8)	9.3 (8.5–10.2)	9.4 (8.5–10.2)	9.8 (9.1–10.5)	9.7 (9–10.5)	10.2 (9.6–10.9)	10 (9.8–10.5)
Platelet hematocrit, %	0.23 (0.2–0.26)	0.24 (0.2–0.3)	0.2 (0.17–0.23)	0.21 (0.18–0.25)	0.23 (0.2–0.27)	0.26 (0.21–0.32)	0.22 (0.19–0.26)	0.2 (0.2–0.24)
Large platelet ratio, %	27.3 (22.3–32.5)	26.6 (21–31.5)	/	/	25.3 (19.9–29.4)	25.8 (20.1–28.8)	/	/

* Significant statistical differences (P < 0.05) for 31 features between CRC and cancer-free groups in the training.

# Significant statistical differences (P < 0.05) for all CBC components of the cancer-free group between the training set (category A) and tuning set (category B, P < 0.05).

Combined components improved the accuracy of single CBC indicator in distinguishing CRC

By conducting ROC analysis on the 29 CBC components in the training set, category A validation set, and category B dataset, respectively, the indicators of iron-deficiency anemia, HGB, and HCT ranked in the top five for CRC detection accuracy (AUC) in all three datasets (Figure 2(a) to (c)). The inflammation indicator LYMPH% and combined components also demonstrated strong abilities to identify CRC (Figure 2(a) to (c)). The five CBC components in Figure 2(a) and (b) had AUC values greater than 0.7 for identifying CRC, but the accuracy significantly decreased in the category B dataset (Figure 2(c), Table S2). Furthermore, the five combined components improved the accuracy of single inflammation indicator in distinguishing CRC (Table S2).

Figure 2.

Accuracy of CBC components on colorectal cancer detection in the training set (a), category A validation set (b), and category B dataset (c).

Stacked random forests improve CRC risk prediction, detecting over half of early-stage cases—outperforming single Ml models

In this study, we ultimately generated six RF sub-models to construct stacking models for predicting CRC risk levels using CBC data (Table 2). The stacking model for category A data consisted of RF1, RF2-M (A), RF2-F, and RF3 (A), while the stacking model for Category B data comprised RF1, RF2-M (B), RF2-F, and RF3 (B). The most important features in all RF sub-models included the indicators of iron-deficiency anemia, HGB, or HCT, along with combined features MPLR and NPLR, with MPLR having the highest contribution in most of RF sub-models (Table 2).

Table 2.

Random forest (rf) submodels in the stacking model.

RF submodel	Training condition (P/N)	Top five priority features
RF1	1:5	MPLR, HGB, NPLR, MLR, HCT
RF2-M (A)	1:2	MPLR, MLR, NPLR, HGB, HCT
RF2-M (B)	2:3	MPLR, MLR, HCT, HGB, NPLR
RF2-F	1:3	MPLR, NLR, HGB, NPLR, MLR
RF3 (A)	Age≥50, 1:2	MPLR, HGB, HCT, NPLR, LYMPH%
RF3 (B)	Age≥50, 1:1	NPLR, HCT, NE, MPLR, Age

The outputs of the CRC risk prediction model are categorized as “-”, “±”, “+”, “++”, and “+++”, representing different levels of CRC risk. “-” indicates no apparent abnormalities, “±” and “+” indicate low and moderately low CRC risk levels, suggesting varying degrees of iron-deficiency anemia, while “++” and “+++” indicate high CRC risk levels, suggesting immune abnormalities and/or iron-deficiency anemia. This studies a CRC risk level equal to or higher than “+” is considered positive. The CRC risk prediction model was applied to predict the CRC risk levels of participants aged 30 and above in each dataset. The results showed that our CRC prediction stacking model achieved a specificity of 80.3% and a sensitivity of 65.2% on external validation set. The stacking model had an AUC of 0.76, specificity of 86.3%, and sensitivity of 66.4% in the external validation set of category A. In the tuning set of category B, the AUC was 0.69, with a specificity of 83.8% and sensitivity of 54.4%. In the external validation set of category B, the AUC was 0.71, with a specificity of 78.9% and sensitivity of 62.3%. Sensitivity for CRC stage I was 41% and 57.6% from stage I to III. The stacking model showed better generalization capability than the five ML models (Table 3).

Table 3.

Accuracy of CRC risk prediction stacking model and other five machine learning model in tuning and validation datasets.

Model	Tuning (CRC: 206 vs. Cancer free: 9598)				Category A Validation (CRC: 610 vs. Cancer free: 2160)				Category B Validation (CRC: 249 vs. Cancer free: 9168)
Model	Acc, %	Spe, %	Sen, %	AUC	Acc, %	Spe, %	Sen, %	AUC	Acc, %	Spe, %	Sen, %	AUC
RF	83.1	83.8	50.5	0.67	78.9	80.3	73.9	0.77	77.0	77.6	55.0	0.66
DT	70.5	71.0	47.6	0.59	66.8	66.5	67.5	0.67	63.8	64.0	56.6	0.60
Logist	80.7	81.3	53.4	0.67	81.7	84.1	73.1	0.79	73.9	74.2	62.3	0.68
GBT	84.1	84.9	48.1	0.66	78.7	79.6	75.4	0.77	77.9	78.6	53.0	0.66
SVM	89.6	91.0	23.8	0.57	79.0	85.7	55.3	0.70	88.6	90.1	33.3	0.62
Stacking	83.2	83.8	54.4	0.69	81.9	86.3	66.4	0.76	78.4	78.9	62.3	0.71

Abbreviation: RF, random forest; DT, decision tree; Logist: logistic regression; GBT, gradient boosting trees; SVM, support vector machine; Acc, accuracy; Spe, specificity; Sen, sensitivity. AUC, area under the curve.

Discussion

Our CRC risk prediction model achieved an AUC of 0.71–0.76 in external validation, demonstrating 54.4–66.4% sensitivity overall and 41% sensitivity for Stage I CRC (Table 3). The overall sensitivity of 57.6% for stages I–III compares favorably with commercial fecal immunochemical tests (64.6% sensitivity),⁷ suggesting CBC-based testing could effectively identify over half of early-stage CRC cases without additional interventions.

The selection of stable features and our risk-stratified modeling approach represent key points in our study. Unlike traditional CBC-based models focusing primarily on iron-deficiency anemia markers (e.g., HGB, HCT), we incorporated five inflammatory response indices (NLR, MLR, PLR, NPLR, MPLR) derived from lymphocyte counts – which ranked among the top 5 most predictive individual features across all datasets (Figure 2). This dual focus aligns with current understanding of CRC pathophysiology, where both anemia¹⁶ and inflammation¹⁷ play significant roles. While previous studies have reported diagnostic value in various CBC components,^16,18–23 we observed notable inconsistencies in feature performance between our category A and B datasets, particularly for WBC count and MCV. These variations, potentially attributable to regional or technical differences, underscore the challenges in developing generalizable CBC-based detection models.

Our stacked modeling approach specifically addresses technical variations between hematology analyzers (Table 1), which significantly impacted both individual feature performance (Table S2) and model predictions (Figure 2). We constructed multiple random forest sub-models with varying specificity by adjusting the positive:negative subsampling ratio for CRC risk stratification. Although the highest-specificity model exhibited the lowest sensitivity, it achieved the lowest false-positive rate, and subjects predicted as positive by this model carried the highest CRC risk. The specificity threshold was progressively reduced, with the lowest-tier sub-model set at 80% to meet clinical requirements. Additionally, we incorporated HCT—a marker of iron-deficiency anemia and high CRC risk—as a gatekeeper for the entire model. Even if a subject tested negative across all sub-models, the system would recommend colonoscopy if HCT levels suggested iron-deficiency anemia.

Compared to ColonFlag²¹ – the only other multi-national validated CBC model – our approach offers several distinctions. While ColonFlag relies heavily on age and anemia indicators, our model integrates inflammatory markers and shows superior performance in datasets where age demonstrates lower predictive value. Our CRC detection model represents a comprehensive assessment of immune abnormalities, hypercoagulability, and anemia characteristics within the body. Like ColonFlag,^22–26 our model could potentially identify CRC cases missed by FOBT and improve screening participation through EHR integration. Despite a 20% false positive rate (Table 3) in our model, there will be cases in the 20% individuals with potential disease like iron-deficiency anemia or immune abnormalities warning by our CRC risk prediction model.

Study limitations

It is important to note that the retrospective design necessitates prospective validation against FOBT to assess real-world impact on screening rates. While addressing technical variations, regional/ethnic heterogeneity in our China-only dataset requires further evaluation.

Conclusions

Our study demonstrates that comprehensive CBC analysis can provide an accessible, low-cost approach for CRC risk stratification, detecting 65% of cases including early-stage disease. While insufficient as a standalone diagnostic, this approach could significantly improve screening participation by: 1. Serving as a non-invasive first-line screening tool; 2. Identifying high-risk individuals for colonoscopy referral; 3. Maintaining generalizability across different laboratory systems. Future work should focus on prospective validation in diverse populations and integration with existing screening programs.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251362072 - Supplemental material for Stacked random forest model for colorectal cancer detection using complete blood counts

Supplemental material, sj-docx-1-dhj-10.1177_20552076251362072 for Stacked random forest model for colorectal cancer detection using complete blood counts by Junfeng Luo, Weiwei Tan, Shaobo Chen, Yijing Chen, Ya Fu, Xiaojuan Jing, Lingling Kang, Qingyun Li, Zhenjian Ma, Tingji Sun, Peng Xiao, Shigui Xue, Xiaozhi Wang and Houde Zhang in DIGITAL HEALTH

Footnotes

Acknowledgements

We thank all study participants and their families and the investigators and members of the following hospital: Nanshan Hospital, Shenzhen Guangming District People's Hospital, The People's Hospital of Longhua, Shenzhen Bao'an People's Hospital, Sichuan Suining Central Hospital, The University of Hong Kong-Shenzhen Hospital, Shantou Central Hospital, and Shanghai Shuguang Hospital.

ORCID iD

Houde Zhang

Ethical approval

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of the Nanshan Hospital (KY-2021-006-01, 26 April 2021). Informed consent is not applicable for this retrospective study.

Contributiorship

Conceptualization, H.Z. and J.L.; methodology, H.Z., J.L. and X.W.; validation, X.W. and J.L.; formal analysis, J.L.; data curation, J.L., W.T., S.C., Y.C., Y.F., X.J., L.K., Q.L., Z.M., T.S., P.X. and S.X.; writing—original draft preparation, H.Z, and J.L.. All authors have read and agreed to the published version of the manuscript.

Funding

This work was Supported by Municipal Financial Subsidy of Nanshan District Medical Key Discipline Construction.

Declaration of conflicting of interests

The authors declare no conflicts of interest.

Data availability statement

The data presented in this study are available on request from the corresponding author due to legal restrictions.

Supplemental Material

Supplemental material for this article is available online.

References

Han

Zheng

Zeng

, et al. Cancer incidence and mortality in China, 2022. Journal of the National Cancer Center 2024; 4: 47-53.

Zheng

Schrijvers

Greuter

, et al. Effectiveness of colorectal cancer (CRC) screening on all-cause and CRC-specific mortality reduction: a systematic review and meta-analysis. Cancers (Basel) 2023; 15: 1948.

Yang

Huang

, et al. A global view of adherence to colonoscopy follow-up in cascade screening of colorectal cancer. Eur J Cancer Care 2022; 31: e13577.

Chen

H-D

Ren

J-S

, et al. Adherence to screening colonoscopy and its influencing factors in China: a multicentre population-based cross-sectional study. Lancet 2017; 390: S22.

Chen

Ren

, et al. Participation and yield of a population-based colorectal cancer screening programme in China. Gut 2019; 68: 1450–1457.

Chung

Gray

Singh

, et al. A cell-free DNA blood-based test for colorectal cancer screening. N Engl J Med 2024; 390: 973–983.

Imperiale

Porter

Zella

, et al. Next-generation multitarget stool DNA test for colorectal cancer screening. N Engl J Med 2024; 390: 984–993.

Miles

Luu

Ong

, et al. Associations between non-anaemic iron deficiency and outcomes following elective surgery for colorectal cancer: a prospective cohort study. Anaesthesia 2025; 80: 48–58.

Crooks

West

Jones

, et al. COLOFIT: development and internal-external validation of models using age, sex, faecal immunochemical and blood tests to optimise diagnosis of colorectal cancer in symptomatic patients. Aliment Pharmacol Ther 2025; 61: 852–864.

10.

Zhang

J-X

Zhang

J-J

, et al. The prognostic value of the neutrophil-to-lymphocyte ratio (NLR) and platelet-to-lymphocyte ratio (PLR) in colorectal cancer and colorectal anastomotic leakage patients: a retrospective study. BMC Surg 2025; 25: 57.

11.

Virdee

Marian

Mansouri

, et al. The Full Blood Count Blood Test for Colorectal Cancer Detection: A Systematic Review, Meta-Analysis, and Critical Appraisal. Cancers (Basel) 2020; 12: 2348.

12.

Zhou

, et al. PCA Outperforms popular hidden variable inference methods for molecular QTL mapping. Genome Biol 2022; 23: 210. DOI: https://doi.org/10.1186/s13059-022-02761-4

13.

Mohammed

Mwambi

Mboya

, et al. A stacking ensemble deep learning approach to cancer type classification based on TCGA data. Sci Rep 2021; 11: 15626.

14.

Ahrens

Hansen

Schaffer

. Pystacked: stacking generalization and machine learning in stata. Stata J 2023; 23: 909–931.

15.

Zhou

Zuo

Making class bias useful: a strategy of learning from imbalanced data. In: International conference on intelligent data engineering and automated learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. pp.287–295.

16.

Siddique

Patel

, et al. AGA Clinical practice guidelines on the gastrointestinal evaluation of iron deficiency anemia. Gastroenterology 2020; 159: 1085–1094.

17.

Templeton

McNamara

Šeruga

, et al. Prognostic role of neutrophil-to-lymphocyte ratio in solid tumors: a systematic review and meta-analysis. J Natl Cancer Inst 2014; 106: dju124.

18.

Çakmak

Soylu

Yonem

, et al. Neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, and red blood cell distribution width as new biomarkers in patients with colorectal cancer. Erciyes Medical Journal 2017; 39: 131.

19.

Boursi

Mamtani

Hwang

W-T

, et al. A risk prediction model for sporadic CRC based on routine lab results. Dig Dis Sci 2016; 61: 2076–2086.

20.

Jianying

. The preliminary study of nucleated red blood cell counting by automated hematology analyzer. Sysmex Journal International 2004: 14.

21.

Kinar

Kalkstein

Akiva

, et al. Development and validation of a predictive model for detection of colorectal cancer in primary care by analysis of complete blood counts: a binational retrospective study. J Am Med Inform Assoc 2016; 23: 879–890.

22.

Kinar

Akiva

Choman

, et al. Performance analysis of a machine learning flagging system used to identify a group of individuals at a high risk for colorectal cancer. PloS One 2017; 12: e0171759.

23.

Goshen

Choman

Ran

, et al. Computer-assisted flagging of individuals at high risk of colorectal cancer in a large health maintenance organization using the colonflag test. JCO Clinical Cancer Informatics 2018; 2: 1–8.

24.

Hornbrook

Goshen

Choman

, et al. Early colorectal cancer detected by machine learning model using gender, age, and complete blood count data. Dig Dis Sci 2017; 62: 2719–2727.

25.

Ayling

Wong

Cotter

. Use of ColonFlag score for prioritisation of endoscopy in colorectal cancer. BMJ Open Gastroenterol 2021; 8: e000639.

26.

Holt

Virdee

Bankhead

, et al. Early detection of colorectal cancer using symptoms and the ColonFlag: case-control and cohort studies. NIHR Open Research 2023; 3.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB