Abstract
Objective
To investigate the risk factors of pulmonary embolism in patients with lung cancer and develop and validate a novel nomogram scoring system-based prediction model.
Method
We retrospectively analyzed the clinical data and laboratory characteristics of 900 patients with lung cancer who were treated, including patients with lung cancer without pulmonary embolism (LC) and patients with lung cancer with pulmonary embolism (LC + PE). The patients were randomly divided into derivation and internal validation groups in a 7:3 ratio. Using logistic regression analysis, a diagnostic model of the nomogram scoring system was developed by incorporating selected variables in the derivation group and validated in the internal and external validation groups (n = 108).
Result
Seven variables (adenocarcinoma, stage III-IV LC, indwelling central venous catheter, chemotherapy, and the levels of serum albumin, hemoglobin, and D-dimer) were identified as valuable parameters for developing the novel nomogram diagnostic model for differentiating patients with LC and LC + PE. The scoring system demonstrated good diagnostic performance in the derivation (area under the curve [AUC]; 95% confidence interval [CI], 0.918; 0.893, 0.943; sensitivity, 88.5%; specificity, 80.5%), internal validation (AUC; 95% CI, 0.921; 0.884, 0.958; sensitivity, 90.5%; specificity, 80.4%), and external validation (AUC; 95% CI, 0.929; 0.875, 0.983; sensitivity; 85.0%; specificity; 87.5%) groups.
Conclusion
In this study, we constructed and validated a nomogram scoring system based on 7 clinical parameters. The scoring system exhibits good accuracy and discrimination between patients with LC and LC + PE and can effectively predict the risk of PE in patients with LC.
Introduction
Currently, lung cancer (LC) is the malignancy with the highest incidence and mortality globally. 1 Pulmonary embolism (PE) is a serious disease of the pulmonary circulation caused by the obstruction of the pulmonary arteries or their branches, which was caused by the dislodgement of emboli mostly from the deep veins of the lower extremities. 2 The incidence of PE in patients with LC is reported to be 6 times higher than that in the population without malignancy. Furthermore, patients with LC are more likely to develop PE than patients with other types of solid tumors, and PE has a mortality risk second only to cancer. 3 However, due to the lack of specific symptoms and signs, misdiagnosis and missed diagnosis of PE secondary to LC tends to be common,4,5 which poses a great threat to the life of patients. Therefore, it is crucial to accurately predict the occurrence of PE in patients with LC. Although computed tomographic pulmonary angiography (CTPA) is currently the first-line diagnostic and evaluation method for PE, 6 it has some limitations, including poor sensitivity for diagnosing thrombosis in pulmonary artery subsegments or distal segments, and the presence of adverse effects, such as contrast agent-induced allergy or renal impairment. In addition, CTPA is time-consuming, expensive, and not available for bedside use during emergencies in emergency departments and primary care units. In this regard, the current study intended to develop a simple, rapid, intuitive, safe, and reliable nomogram statistical prediction model by exploring the risk factors for PE in patients with LC, and to perform the internal and external validations of its accuracy to reduce the incidence and mortality risk of PE in patients with LC and improving their quality of life and disease prognosis.
Materials and Methods
Patients and Study Design
Clinical data were retrospectively collected from patients who underwent bronchoscopy, had a final diagnosis of LC, which was confirmed via cytology or histopathology by percutaneous needle biopsy of the lung or surgically resected specimens, and were treated from May 2014 to May 2022. The flow chart of the study is shown in Figure 1.

Flow diagram of the overall procedures. (A) Ningbo First Hospital group. (B) The Affiliated People Hospital of Ningbo University group. Abbreviations: LC, patients with lung cancer without pulmonary embolism; LC + PE, patients with lung cancer with pulmonary embolism.
A total of 900 patients met the inclusion criteria. Patients were randomly divided into derivation and internal validation groups in a 7:3 ratio. An external validation group from the Affiliated People's Hospital of Ningbo University was formed simultaneously. The patients with LC meeting the inclusion criteria in the First Hospital of Ningbo City were randomly divided into derivation and internal validation groups in a 7:3 ratio. An external validation group from the Affiliated People's Hospital of Ningbo University was formed simultaneously. In addition, when the simplified Wells score 7 was more than 2 points, and one of the following inclusion criteria was met, LC with PE was determined. The inclusion criteria were (a) pulmonary angiography indicating intrapulmonary contrast filling defect with or without the signs of blood flow occlusion with orbital signals and (b) CTPA showing the presence of hypodense filling defect in pulmonary arteries with nondistinct distal vessels and followed up (for at least 6 months). The exclusion criteria included (a) patients <18 years of age, (b) pregnant women, (c) history of PE, (d) other comorbid malignancies, and (e) incomplete data for analysis. The study was approved by the ethics committee of the study site and was conducted in accordance with the Declaration of Helsinki. Based on the retrospective and traceable nature of the study data, written informed consent was not required.
Data Collection
The following clinical and laboratory data were obtained from the clinical electronic record system: age; gender; body mass index; smoking status; pathological type; tumor, nodes, and metastases (TNM) stage; indwelling central venous catheter (CVC); treatment modalities (surgery, chemotherapy, radiotherapy, molecular targeted therapy, and immunity therapy); comorbidities (chronic obstructive pulmonary disease hypertension and diabetes); and serum levels (serum albumin, hemoglobin, and D-dimer). TNM stage refers to the international LC TNM staging proposed by the Union for International Cancer Control. 8 Youden's index was used to convert all laboratory variables and ratios to categorical variables based on the optimal cutoff value. Youden's index, also known as correct index, is a method to evaluate the authenticity of diagnostic tests. It can be used if the false negatives (rate of missed diagnosis) and false positives (rate of misdiagnosis) are equally harmful. Youden's index is the sum of sensitivity and specificity minus 1. It represents the overall ability of the diagnostic method to find real patients versus nonpatients. The higher the index, the better will be the results of the diagnostic test and the more true it is.
Statistical Analysis
Continuous variables are expressed as mean ± standard deviation and compared using t-test or Mann–Whitney U-test. Categorical variables were expressed as quantity (n) and percentages (%) and compared using chi-square test or Fisher's exact test. Univariate logistic regression analysis was employed to investigate independent risk factors in the derivation group and all the variables with significant levels (area under the curve [AUC] > 0.65) were candidates for multivariate analysis. Statistically significant variables were then identified via stepwise selection using the Akaike information criterion (AIC) in a multivariate regression model. The odds ratio (OR) was estimated and 95% confidence intervals (CIs) were provided. The rms package for R (version 4.0.5) was used to merge the selected variables into the nomograms to build the scoring system. Calibration curve and decision curve analysis (DCA) were also performed. Receiver operating characteristic curves and corresponding AUCs were calculated to determine the discriminatory ability of the model to distinguish between patients with LC and those with LC + PE. In addition, sensitivity and specificity were determined to assess the diagnostic accuracy of the nomograms. All statistical analyses were performed using R (version 4.0.5; http://www.r-project.org) and SPSS 22.0. A 2-tailed P < .05 was considered significant.
Results
Baseline Characteristics
The flow chart of the patient inclusion and exclusion processes used in this study is shown in Figure 1. A total of 1074 eligible patients from the Ningbo First Hospital were reviewed and randomly divided into derivation (n = 649) and internal validation (n = 251) groups. The incidence of PE among patients with LC was 14.8% (96 of 649). In addition, 108 eligible patients from the People's Hospital of Ningbo University comprised the external validation group. The demographic, clinical, and laboratory characteristics of the 3 patient groups are shown in Table 1.
The Clinical Characteristics of the Derivation Group, Internal Validation Group, and External Validation Group.
Abbreviations: BMI, body mass index; COPD, chronic obstructive pulmonary disease; TNM, tumor, nodes and metastases.
Logistic Regression Analyses in Patients With LC and LC + PE
The occurrence of PE was considered the dependent variable, items with statistically significant differences in the univariate analysis as the independent variables, and multivariate regression analysis was performed to establish an accurate prediction model. The following 7 most valuable variables were identified in the regression model using the AIC method to distinguish patients with LC from those with LC + PE. These variables (OR, 95% CI) included (Table 2): pathological type (1.870, 1.034-3.424), TNM stage (2.655, 1.449-4.997), indwelling CVC (7.745, 4.222-14.869), chemotherapy (1.864, 1.038-3.398), the serum albumin (0.416, 0.226-0.754), hemoglobin (2.201, 1.213-4.031), and D-dimer (24.339, 12.338-52.605) levels.
Multivariate Logistic Regression Analysis of the Clinical Parameters in the Derivation Group.
Abbreviations: CI, confidence interval; OR, odds ratio; TNM, tumor, nodes and metastases.
Derivation and Validation of the Nomogram Prediction Model
Nomograms were developed based on the above 7 variables (Figure 2A). The calibration curves of the nomograms showed that the predicted lines overlapped well with the reference lines, indicating that the nomogram showed good performance in the derivation group (Figure 2B). Additionally, to validate the clinical utility of the model, DCA was applied to assess the net benefit of the diagnostic nomograms, revealing that patients would benefit more from the “treat-all” or “treat-none” strategy when the threshold probability was >0.4 (Figure 2C).

Derivation and validation of the diagnostic nomogram. (A) Diagnostic nomogram for distinguishing LC from LC + PE in the derivation group. (B) Calibration curve of the nomogram. (C) Decision curve analysis of the nomogram. Abbreviations: LC, patients with lung cancer without pulmonary embolism; LC + PE, patients with lung cancer with pulmonary embolism; TNM, tumor, nodes and metastases.
Diagnostic Performance of the Scoring System in the Derivation and Validation Groups
In the derivation group, D-dimer levels exhibited the greatest impact on distinguishing patients with LC from those with LC + PE in the model with 100 points (Figure 2A). Next, the other 6 variables were modified to integer points as pathological type (20 points), TNM stage (31 points), indwelling CVC (64 points), chemotherapy (20 points), the serum albumin (28 points), hemoglobin (25 points), and D-dimer (100 points) levels (Table 3). In the derivation group, this scoring system demonstrated good performance in discriminating between patients with LC and those with LC + PE with an AUC of 0.918 (95% CI 0.893-0.943; Figure 3A and Table 4). The corresponding specificity and sensitivity were 88.5% and 80.5%, respectively (Table 4). The scoring system also showed good discriminative value in the internal and external validation groups with AUCs of 0.921 (95% CI 0.884-0.958; Figure 3B and Table 4) and 0.929 (95% CI 0.875-0.983), respectively (Figure 3C and Table 4). The specificity and sensitivity in the internal validation group were 90.5% and 80.4%, respectively, while those in the external validation group were 85.0% and 87.5%, respectively (Table 4). In addition, the calibration curves of the scoring system exhibited good agreement across the 3 data groups (Figure 3D-F), indicating that the model has good precision and discrimination.

Discrimination and calibration of the scoring system for distinguishing LC and LC + PE. (A-C) ROC curves of the scoring system in the derivation group, internal validation group, and external validation group. (D-F) Calibration curves of the scoring system in the derivation group, internal validation group, and external validation group. Abbreviations: AUC, area under the curve; LC, patients with lung cancer without pulmonary embolism; LC + PE, patients with lung cancer with pulmonary embolism; ROC, receiver operating characteristic.
Diagnostic Nomogram Score Calculation.
Abbreviation: TNM, tumor, nodes and metastases.
Accuracy of the Prediction Score of the Nomogram for Differentiating LC from LC + PE.
Abbreviations: AUC, area under curve; LC, patients with lung cancer without pulmonary embolism; LC + PE, patients with lung cancer with pulmonary embolism; SE, sensitivity; SP, specificity.
Discussion
LC is one of the most common malignancies associated with PE. A meta-analysis included 41 studies, in which the total incidence of LC-related PE was about 3.7% (1172 of 31294), and the incidence of individual studies ranged from 1.3% to 23.7%. 9 The incidence of PE among patients with LC in the current study was as high as 14.8% (96 of 649), indicating that patients with LC are at a high risk for developing PE. Due to the presence of asymptomatic PE and the low number of autopsies, the actual incidence may be much higher. Some studies have reported that the proportion of patients with partial occult PE fluctuates between 29.4% and 63.0%. 9
Studies have shown that there are more pathogenic mechanisms and risk factors affecting PE secondary to LC.9,10 Vascular endothelial cell damage, altered blood flow status, and hypercoagulability are the 3 elements of thrombosis, and an imbalance between coagulation and fibrinolysis is the root cause of thrombosis. The interaction between tumors and the coagulation system is yet to be fully elucidated. The main mechanisms9,11,12 known so far are: (1) tumor cells directly invade or compress the vascular wall, causing vascular endothelial damage. Concurrently, the tumor volume reaches a certain level, leading to a hypoxic state in the body, causing vascular endothelial damage indirectly under hypoxia. (2) Tumor cells release procoagulant factors, such as tissue factor and cancer procoagulant substances. (3) An imbalance of the anticoagulation/fibrinolytic system. (4) Cell activation and interactions between cytokines.
The results of the present study showed that adenocarcinoma, stage III-IV LC, indwelling CVC, chemotherapy, and the serum levels of albumin <30 g/L, hemoglobin ≥140 g/L, and D-dimer ≥500 mmol/L are independent risk factors for PE in patients with LC. Patients with lung adenocarcinoma are significantly more likely to develop PE than those with non-adenocarcinoma.13,14 Adenocarcinoma has a strong invasive ability and is prone to early metastasis. Tumor progression induces massive expression of tissue factors that readily activate platelets, causing thrombosis. 15 This study showed that the number of patients with lung adenocarcinoma in the group with PE was higher, and the difference was statistically significant (P = .0397), which was similar to previous studies. PE incidence is high in stage III-IV LC. Patients with stage III-IV LC exhibit a higher expression of cancer procoagulants than those with early-stage LC, and their blood is in a long-term hypercoagulable state, with higher levels of prothrombin and cytokines, which may be the main reason why patients with advanced cancer are more likely to develop comorbid PE. Nearly half of our patients had stage III-IV LC, and the later stage of LC was an important risk factor for PE (OR 2.655, 95% CI: 1.449-4.997, P = .0019), which is consistent with another study, similar to previous studies. 9 A meta-analysis showed that CVC was a risk factor for PE in patients with LC, 3 and that increased risk of PE due to CVC may be related to factors such as puncture injury to the vascular endothelium, compression and distortion of the infusion tubing, and reduced activity after catheter insertion. Our results showed that indwelling central venous catheter was a risk factor for PE in patients with LC (OR 7.745, 95% CI: 4.222-14.869, P < .0001), which is consistent with another study. Chemotherapy is a definite risk factor for LC + PE, 16 especially platinum-containing chemotherapy regimens. Studies have shown that patients receiving chemotherapy have approximately 5.35 times the risk of developing PE than other patients. 17 Commonly used chemotherapeutic drugs can increase the procoagulant activity of endothelial cells through protein disulfide isomerase-dependent disulfide bond formation and tissue factor activation and can also directly damage endothelial cells, leading to thrombosis. 18 The results of this study are similar to those described in the previous literature on the role of chemotherapy in the occurrence of PE. Patients with LC are prone to hypoproteinemia. On one the hand, when the body is in a hypoalbuminemic state, it will stimulate the liver to synthesize albumin, concurrently increasing the synthesis of coagulation factors, resulting in a hypercoagulable state. 19 On the other hand, in the presence of hypoproteinemia, water accumulates in the tissue interstitial spaces, resulting in increased blood viscosity and a higher risk of thrombosis. Erythropoietin transcripts and proteins are expressed in tumor cells, and LC tumor cells can cause a noncompensatory increase in erythropoietin, causing abnormal hemoglobin elevation and greater resistance to blood flow, resulting in slow blood flow. 20 Our study once again verified previous studies, suggesting that hypoproteinemia and high hemoglobin are important risk factors for patients with LC + PE (P = .0041 and .0097). D-dimer levels are a specific marker of fibrinolysis, having high diagnostic value for thrombotic diseases, as it is a degradation product of fibrin. It can be assumed that D-dimer levels are positively correlated with the incidence of PE, and PE can be excluded if its value is negative. In this study, it was found that the level of D-dimer was significantly increased in LC + PE group, and high D-dimer level was a risk factor for the development of PE in patients with LC, which was consistent with the conclusions of previous studies.13,21 Notably, although smoking is a high risk factor for LC as well as PE, it has not been shown to increase the incidence of LC + PE. 22
In this study, the 7 independent risk factors were integrated to establish a nomogram for predicting the risk of PE in patients with LC, and the prediction system was validated to have high accuracy and discrimination. Healthcare professionals can conduct the personalized prediction of comorbid PE incidence in patients with LC based on the assigned values of each risk factor. Except for uncontrollable factors, such as adenocarcinoma and stage III-IV LC, clinicians should strengthen the screening and management of controllable factors and minimize the placement of intravenous catheters and use of certain chemotherapeutic drugs that are likely to damage the vascular endothelium while ensuring the highest quality of life for patients. Patients with abnormally high serum albumin, hemoglobin, and D-dimer levels should be given increased clinical attention, an early diagnosis should be made in combination with hematological tests, and the prevention of PE in patients with LC should be actively ensured.
Conclusion
Hence, in this study, we developed and validated a nomogram scoring system based on 7 easily accessible parameters (adenocarcinoma, stage III-IV LC, indwelling CVC, chemotherapy, the serum levels of albumin, hemoglobin, and D-dimer). The scoring system exhibited good diagnostic performance in differentiating between patients with LC and those with LC + PE. This can help clinicians screen patients with LC to assess for the presence of PE. In addition, because this study is a retrospective analysis, future multicenter, prospective studies are warranted to validate our results.
Supplemental Material
sj-docx-1-cat-10.1177_10760296231151696 - Supplemental material for Derivation and External Validation of a Risk Prediction Model for Pulmonary Embolism in Patients With Lung Cancer: A Large Retrospective Cohort Study
Supplemental material, sj-docx-1-cat-10.1177_10760296231151696 for Derivation and External Validation of a Risk Prediction Model for Pulmonary Embolism in Patients With Lung Cancer: A Large Retrospective Cohort Study by Ning Zhu, MD, Lei Zhang, MD, Shengping Gong, MD, Zhuanbo Luo, MD, Lei He, MD, Linfeng Wang, MD, Feng Qiu, MD, Weina Huang, MD, and Chao Cao, PhD in Clinical and Applied Thrombosis/Hemostasis
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The study involving human participants were reviewed and approved by the Ethics Committee of Ningbo First Hospital (No. 2022-RS139) and the Institutional Ethics Committee of the Affiliated People Hospital of Ningbo University (No. 2022-Y-061). The written informed consent for patients was exempted by the Ethics Committees for its retrospective nature.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Science and Technology Innovation 2025 Major Project of Ningbo (2019B10037).
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
