Abstract
Cancer patients are at high risk of developing venous thromboembolism (VTE). The risk of VTE could be mitigated with the administration of prophylactic anticoagulants. Therefore, risk assessment models would be a useful tool in order to identify those patients who are at higher risk and will be benefited more by prophylactic anticoagulants. This study retrospectively examined 528 newly diagnosed colorectal cancer patients from January 2019 to January 2021. Specified logistic regression models were employed to screen the factors and establish prediction tools based on nomograms according to the final included variables. Discrimination, calibration, and clinical applicability were used to assess the performance of screening tools. In addition, internal verifications were conducted through 10-fold cross-verification, leave-one-out cross-validation, and Bootstrap verification. Four risk factors, closely related to the occurrence of VTE in colorectal cancer patients, were identified after univariate and multivariate logistic regression, including age, body mass index, activated partial thromboplastin time, and D-Dimer value. Besides, the risk assessment model named ABAD was built on the basis, displaying good discriminations and calibrations. The area under the curve was 0.705 (95% confidence interval [CI], 0.644 to 0.766). According to Hosmer–Lemeshow goodness-of-fit test, a good agreement between the predicted and observed VTE events in patients with newly-diagnosed gastrointestinal cancer was observed for χ2 = 6.864, P = .551. Internal validation was applied with a C-index of 0.669 in the 10-fold cross-verification, 0.658 in the leave-one-out cross verification and 0.684 in the bootstrap verification. We developed a prediction model called ABAD for newly diagnosed colorectal cancer patients, which can be used to predict the risk of VTE. After evaluation and internal verification, we believe that ABAD exhibited high predictive performance and availability and could be recommended.
Introduction
Patients with cancer are at least 4 to 7 times more likely to develop venous thromboembolism (VTE) than nontumor patients, 1 including deep vein thrombosis (DVT) and pulmonary embolism (PE).2,3 Evidence-based medicine has shown that VTE has a negative impact on the chances of survival and overall life quality of cancer patients. However, VTE is a potentially preventable disease 4 with current guidelines recommending the use of LMWH or direct oral anticoagulants (DOACs) for patients at high risk.5,6 It has also been speculated that anticoagulants may improve survival in patients with cancer through anti-tumor effects in addition to their beneficial anti-thrombotic effects.7‐9 Unfortunately, significant bleeding risk is also 2 to 3 times higher in patients with cancer on anticoagulation than in cancer-free people, 7 therefore, stratified treatment is particularly important. We need to use the risk assessment models to predict and assess the risk of VTE in patients in the early stage of cancer, and consider prophylactic anticoagulants in high-risk patients. For the low-risk group, we can substitute routine follow-up for anticoagulation to reduce unnecessary anticoagulation hemorrhage.
The risk of VTE varies considerably among patients with cancer, depending on the patient, the malignancy, and treatment-related variables. 10 Several investigative studies suggested that the risk of venous thrombosis was highest in the first few months following the diagnosis of malignancy.11,12 Although several scoring systems,13‐18 for example, the Khorana Risk Score (KRS) 13 and the Caprini model, 16 have been developed to predict the occurrence of VTE in patients with cancer, most of them merely included the tumor site as one of the model's predictors, making it difficult to accurately define or assign specific prognostic indicators and biomarkers in comprehensive models. To further increase the predictability of prediction models for VTE in the oncological setting, specific scores corresponding to various tumor types must be developed. Currently, no VTE prediction model specifically for colorectal cancer patients has been proposed, and only a few studies have included some colorectal cancer populations. Among them, the KRS model 13 included 297 colorectal cancer cases, accounting for 11% of the total population. Approximately 170 colorectal cancer patients were included in the CATS study 17 in 2018 and the COMPASS-CAT study 18 in 2016, respectively.
Our research aimed to establish a tool to predict the risk of VTE development in newly diagnosed colorectal cancer. In clinical decision-making, we can use anticoagulant drugs for patients with gastrointestinal cancer at high risk of VTE, to reduce the incidence of VTE and improve the survival rate. For low-risk patients, we can conduct regular follow-ups to improve clinical outcomes.
Patients and Methods
Study Design and Participants
A retrospective study was conducted on a cohort of patients who were hospitalized with newly diagnosed colorectal cancer between January 2019 and January 2021 at Beijing Shijitan Hospital (China). The inclusion criteria were (1) age above 18, (2) histologically confirmed colorectal cancer, and (3) no history of other cancers and previous anticancer therapy. The exclusion criteria were (1) venous or arterial thromboembolism within the past 3 months, (2) continuous or direct use of anticoagulants (eg unfractionated heparin, low molecular heparin, and DOACs), (3) use of red cell growth factors, (4) pregnant women. The specific inclusion process of the study population is shown in Figure 1 and the study size was arrived following the rule that the number of events should be at least 10 times the number of variables included in the model. Our research was conducted in accordance with the Declaration of Helsinki with confirmation of confidentiality.

Flow chart of criteria used to select the patients for inclusion.
Collection of Data
Data were collected from all eligible patients, including demographic factors (such as age and sex), tumor-related pathology and staging, concomitant medical conditions, laboratory tests, and so on. The laboratory tests included the following common laboratory test: blood routine examination, coagulation profile and biochemical test (Table 1). VTE, which includes DVT and PE, was diagnosed by lower extremity vascular ultrasound and computer tomography pulmonary angiography.
Characteristics of Participants With Newly Diagnosed Colorectal Tumors.
Notes: Values are presented as mean ± standard deviation and number (%).
CHD, coronary heart disease; BMI, body mass index; PT, prothrombin time; INR, international normalized ratio; FIB, fibrinogen; APTT, activated partial thromboplastin time; TT, thrombin time; FDP, fibrinogen degradation products; GLU, glucose; UA, uric acid; TC, total cholesterol; TG, triglyceride; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; LPa, lipoprotein.
Establishment and Evaluation of the Model
The following 4 steps were established to construct the clinical prediction model for tumor-associated VTE: screening of risk factors; establishment of prediction model; evaluation of the prediction model and internal validation. Univariate and multivariate logistic regression were used to screen risk factors. Nomogram was selected to develop the prediction model of VTE since a nomogram had better applications and advantages in the interpretation of complicated mathematical models compared with the conventional statistical presentation. 19 We assessed the developed models based on 3 criteria: calibration, discrimination, and clinical applicability. Finally, internal validation was conducted using 3 methods including 10-fold cross-verification, leave-one-out cross-validation, and bootstrap verification.
Statistical Analysis
Continuous variables were compared using the t-test and presented as mean ± SD. Categorical variables were expressed in numbers (percentages) and compared through the χ2 test.
Model establishment and evaluation adhered to Harrell Regression Modeling Strategies, Steyerberg’s guidelines for clinical prediction models, and the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis statement. 20 Univariate and multivariate logistic regression were used to screen risk factors and a P-value < .05 was considered significant. Model performance was assessed through discrimination (capacity to accurately differentiate 2 classes of outcomes) and calibration (agreement between predicted and observed VTE). Discrimination was evaluated by calculating the area under the curve [AUC] receiver operating characteristic [ROC]).20,21 Calibration was assessed by calibration curve and Hosmer–Lemeshow goodness-of-fit test21,22 to determine how well the predicted probabilities agree with the results numerically. To further assess the extent of benefit to patients applying our predictive model, we also plotted the clinical decision curve and clinical influence curve. The internal generation of the screening equation was evaluated by the 10-fold cross-validation technique, 23 leave-one-out cross-validation technique, and Bootstrap technique. 21 Statistical analysis was conducted in R software version 4.1.2 (https://www.R-project.org).
Results
Participant Characteristics
A total of 528 adult patients with newly diagnosed colorectal cancer were enrolled. The median age was 63 years (IQR 59–69 years), with 44.5% males. The follow-up time was 12 months and VTE occurred in 85 (16%) of 528 subjects, including deep venous embolism of lower limbs (n = 83), and PE (n = 2). Ninety-three percentage of the VTE cases occurred within one month of diagnosis of malignant tumor and 99% within 3 months. Detailed data on the baseline characteristics of the study population are shown in Table 1.
Risk Factors for Thrombosis
Firstly, we performed univariate logistic regression analysis to screen 41 clinical prognostic factors and biomarkers and identified 8 meaningful variables (Table 2), including age, combined with hypertension, hemoglobin, platelets, red blood count, activated partial thromboplastin time (APTT), D-Dimer (DD) and body mass index (BMI). Risk factors with P-value < .05 were then included in the multivariate logistic regression analysis for further screening. Finally, according to the multivariable analysis, age (OR = 1.036; 95% CI: 1.013–1.060; P = .002), BMI (OR = 0.905; 95% CI: 0.834–0.980; P = .016), APTT (OR = 0.903; 95% CI: 0.830–0.978; P = .014), DD (OR = 1.000; 95% CI: 1.000–1.001; P = .016) were independent predictors for VTE (Table 3).
Univariate Logistic Regression Analysis (P-Value < .05).
Note: BMI, body mass index; APTT, activated partial thromboplastin time.
Multivariate Logistic Regression Analysis (P-Value < .05).
Note: BMI, body mass index; APTT, activated partial thromboplastin time.
Development of Predictive Models
To predict the risk of VTE in patients with newly diagnosed colorectal cancer, we established a risk assessment model “ABAD” with the 4 variables above, including age, BMI, APTT, and DD (Figure 2). In this nomogram, each variable is marked with a scale that represents the range of values available for that variable. Points represent the corresponding score of each variable under different values. Total points represent the total score obtained by adding the points of all variables, corresponding to the risk of VTE of a patient. The ROC curve (Figure 3A) was drawn to evaluate the discrimination of the model. The AUC of nomogram “ABAD” was 0.705 (95% CI: 0.644–0.766). Hosmer–Lemeshow goodness-of-fit test and calibration curve were conducted to estimate the calibration of the model. As we can see, good agreement between the predicted and actual VTE events in patients with colorectal cancer was observed (χ2 = 6.864, P = .551). The calibration curve was presented in Figure 3B. Further, the clinical decision curve (Figure 4A) and clinical influence curve (Figure 4B) were plotted to show the clinical application value. According to the decision-curve analysis, it could be useful to predict those patients with VTE risk greater than 20–40% who would benefit from thromboprophylaxis.

Nomogram “ABAD” based on multivariate screening. BMI, body mass index; APTT, activated partial thromboplastin time; DD, D-dimer.

The discrimination and calibration of the model “ABAD”. (A) ROC curve of the model “ABAD”; (B) calibration curve of the model “ABAD”. ROC, receiver operating characteristic.

The clinical application of the model “ABAD”. (A) Clinical decision curve; (B) clinical influence curve.
Validation of the Predictive Model
Three methods of internal validation were applied to evaluate the performance of the model, including 10-fold cross verification, leave-one-out cross-validation, and Bootstrap validation. They hold a value of C-index with 0.669, 0.658, and 0.684, which are shown in Table 4.
C-index and AUC of Model“ABAD” and Validation of Different Methods.
Note: AUC, area under curve of receiver operating characteristic.
Discussion
In this study, we screened out 4 parameters that were independently predictive of VTE in newly diagnosed colorectal cancer patients by the retrospective analysis of various medical history and laboratory indicators routinely collected in the clinic, and on which, a risk assessment model was established (Figure 2). The model was available as a paper-based nomogram, evaluated in terms of the ability for discrimination and calibration and internally validated. Through this model, clinicians can assess the risk of VTE within 1 year (especially within 3 months as 99% of the VTE cases occurred during this time in our primary data) of diagnosis in patients with malignant colorectal tumors and develop different clinical strategies for patients at high and low risk. The application of the model to identify patients who would benefit from thromboprophylaxis would increase therapeutic benefits by lowering the risk of VTE according to decision analysis curves.
There have been a number of prediction models for VTE in newly diagnosed patients with cancer, however, all have various limitations and are insufficient in identifying high-risk and low-risk patients. Among them, the Khorana Score 13 has been verified and widely used, and its risk indicators include primary tumor site, platelet count ≥350 × 109/L before chemotherapy, hemoglobin level < 100 g/L or being treated with a red blood cell growth factor, WBC count > 11 × 109/L and BMI ≥35 kg/m2 before chemotherapy. Up to now, a few researchers have verified the application of the KRS model in colorectal cancer. In 2020, Sandro B et al verified the relationship between KRS score and VTE risk in 1380 patients with stage II to III colorectal cancer, and found that KRS did not predict VTEs in a low-moderate thromboembolic risk. 24 Besides, an external validation of the Khorana score by Haltout et al in 2019 found a sensitivity of 29% with a positive predictive value of only 15.0%, 25 whose research group includes breast cancer, lung cancer, colorectal cancer, and so on. These researches suggest that KRS has some limitations in the prediction of VTE risk in colorectal cancer patients and other risk assessment models should be researched.
Some risk models are optimized by including or modifying risk factors based on the Khorana model. The Vienna model, 14 with the addition of DD and sP selection, has a higher predictive ability than the Khorana score and the incidence of thrombotic events in the predicted high-risk patients can reach up to 35% at 6 months after diagnosis. Further restricting the generalizability of this paradigm is the fact that sP-selectin is infrequently seen in clinical settings. The tumor location and DD were the 2 elements that the CATS nomogram model 17 retrieved as the 2 most important components from the previous 2 models, which is convenient but has limited error tolerance. Caprini model 16 is another widely used model, which includes 40 different risk factors and contains risk indicators mostly related to surgery without considering tumor-related factors. Some studies have shown that it is also applicable to people who are receiving medical treatment, 26 however, due to the fact that it covers too many characteristics, requires a lot of time to assess, and raises medical expenditures, its practical applicability is constrained.
The incidence of VTE varies between different sites of the tumor, and the mechanism also varies greatly. However, the above prediction models only consider tumor site as one of the risk factors, which may mask the predictive role of some risk factors in specific tumors. Establishing different prediction models according to stratified tumor sites may be helpful to improve the prediction ability. Therefore, in this study, we attempted to establish a specific VTE risk assessment model for newly diagnosed colorectal cancer patients. In the screening of risk factors, on the one hand, we collected as many risk factors as possible based on existing clinical guidelines and studies. On the other hand, considering the simplicity and practicability of the model, we selected the factors that must be checked at the initial diagnosis of tumors, including general medical history, common complications, and routine laboratory tests such as blood routine examination and coagulation function. Based on univariate and multivariate analyses of 41 clinically relevant variables, we finally screened 4 common risk factors, including age, BMI, APTT, and DD value.
Consistently with some studies, older age11,12,25 is a predictive factor for the occurrence of VTE. In our research, each unit increase in age increased the risk of VTE by 3.6%. By contrast, to our surprise, we found that BMI had a negative effect on the risk of VTE in both univariate and multivariate regression. The risk of VTE in colorectal cancer was reduced by 10% for every unit increase in BMI, which seems to run counter to the accepted belief that obesity 27 is a risk factor for VTE. By contrasting the data, we found that BMI was generally low in our study population, with only 5.8% having a BMI ≥ 28 and a half < 24 kg/m2. This may suggest that low body weight can still increase the risk of VTE, which is worthy of further investigation.
The APTT was found to be shorter in patients with VTE than in those without. According to the multivariate logistic regression, each unit increase in APTT lowered the risk of VTE by 10% (P < .05). An association of lower APTT with thrombosis could be explained by resistance to activated protein C. 28 Actually, APTT as an indicator of risk for VTE has been studied in a variety of cancer entities including brain cancer, breast cancer, gastrointestinal cancer, etc. 29 In 2018, Neil A et al found that a single determination of the APTT below the median increased the risk of future VTE and a low APTT added to the risk associated with other risk factors such as obesity and DD. 30 DD test is widely available in clinical works, which has been validated across multiple cohorts for exclusion of PE 31 in diagnostic settings and as an independent VTE risk factor 32 in prognostic settings. Besides, High DD levels are reported to be associated with poor prognosis in cancer patients. 33 In our research group, DD turns out to be an independent predictor for VTE (OR = 1.000; 95% CI: 1.000–1.001; P = .016).
The statistical challenges including the establishment, derivation, and validation of risk models merit discussion. For the determination of the number of cases, we followed the commonly accepted rule that the number of events should be at least 10 times the number of variables included in the model. 23 In our research, 85 patients occurred VTE and our model comprised 8 variables in the multivariate logistic regression. For the presentation of the prediction model, we included the 4 screened risk factors and presented them in the form of nomograms, which combined multiple indicators and applied complex mathematical models to disease diagnosis and prediction. Compared with the traditional statistical representation, this technique was more advantageous in interpretation. 34 We also evaluated the model from 3 aspects, including discriminations, calibration, and clinical usefulness. 19 In addition, our internal verification was conducted through 10-fold cross-verification, leave-one-out cross-validation, and Bootstrap verification to evaluate the applicability and generalizability of the model to enhance stability and avoid overfitting.
Our study has some limitations. We were unable to include some very informative data that could be crucial and closely related to colorectal cancer because of the constraints of retrospective research. We anticipate that as genetics and bioinformatics advance, more predictive biomarkers will be identified and utilized, such as tumor genomic characteristics in the Tic-ONCO model, 15 sP-selectin in the Vienna CATS Score, 17 etc. Additionally, although these factors might have undoubtedly affected the development of VTE, we did not examine variables emerging during therapy because our study aimed to prevent VTE in gastrointestinal malignancies as soon as possible after they had been detected. And finally, due to the lack of external validation, it is not known whether our results are applicable to other groups, so further studies with additional populations are needed. Overall, the above deficiencies limit the clinical generalization of this prediction model, and further prospective studies with larger sample sizes and rigorous designs are needed. As a preliminary exploration of this research topic, we hope that this study can provide some reference for future prospective studies.
Conclusion
In our research, we established a risk assessment model called ABAD for VTE occurrence in patients with newly diagnosed colorectal malignancies, to help clinicians take evaluation and prevention measures with early prevention, high efficacy, and low risk. We hope that this model can be further validated, or provide some reference for future prospective studies.
Footnotes
Abbreviations:
Author Contributions
Study design: X-H W; statistical analysis and manuscript draft: W-J T; clinical data review and acquisition: CL, S-J Y, B-Y Z, M-L S, and Y-B L. All authors read and approved the final manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This study was approved by the Research Ethics Boards of Beijing Shijitan Hospital on April 25, 2022 in Beijing, China. (ID: sjtky11-1x-2022(053)).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and publication of this article. This study was supported by Beijing Municipal Science and Technology Project (Z211100002521011) and National Science and Technology Major Project (2017ZX09304026).
