Abstract
Background
Prediction models for cancer-associated pulmonary embolism (PE) in lung cancer patients are scarce. This study aimed to develop and validate a novel model to accurately predict PE risk in this population.
Methods
A retrospective cohort (n = 476) was used to identify PE-related risk factors and construct a predictive nomogram using Cox regression. Validation was performed in a prospective cohort (n = 140). The model's performance was compared with the Khorana score.
Results
The newly developed nomogram termed
Conclusions
The POLE model, based on seven clinical parameters, was developed and validated for predicting cancer-associated PE in lung cancer patients, demonstrating high accuracy, calibration, and stability.
Introduction
Cancer-associated venous thromboembolism (VTE) is the second leading cause of death in cancer patients. 1 Lung cancer, one of the most common malignancies worldwide, 2 carries a particularly high risk of VTE. 3
Pulmonary embolism (PE), a potentially life-threatening form of VTE, was found 4 in 12.4% of deceased cancer patients, with PE-related deaths accounting for 6.7% of the total population. Among patients with autopsy-confirmed PE, lung cancer patients represented the largest proportion (27.2%). A meta-analysis 5 reported that the overall incidence of PE in lung cancer was 3.7%, ranging from 1.3% to 23.7%. First-time PE in lung cancer patients often signals severe disease and poor prognosis, 6 highlighting the need for early risk prediction.
Existing predictive models for cancer-associated thrombosis, such as the Khorana score, have limited accuracy in lung cancer patients. 7 Other lung cancer-specific risk scores, including the ROADMAP-CAT score 8 and Rising-VTE/NEJ score, 9 require further validation and are not specifically designed to predict PE. To address these gaps, we aimed to develop and validate a novel clinical scoring model to predict PE in lung cancer patients. This model is intended to enable earlier identification of high-risk individuals and timely implementation of effective prophylactic and therapeutic strategies.
Methods
Study Design
A retrospective cohort of lung cancer patients (n = 476) from a tertiary teaching hospital in Shanghai, China (January 2015–April 2021), was used to develop a predictive model for acute PE. A prospective cohort (n = 140, May 2021–May 2023) served as a validation set. All patients were followed for one year post-diagnosis. The model, designed for predicting Pulmonary embOlism in Lung cancEr, was named the POLE model. Its performance was compared with the Khorana score. 10
Study Population
Inclusion criteria: (a) pathologically or cytologically confirmed primary lung cancer; (b) age >18 years; (c) absence of other concurrent primary malignancies; Exclusion criteria: (a) loss to follow-up or significant gaps in clinical data; (b) presence of chronic thromboembolic disease; (c) clinically or radiologically suspected but unconfirmed PE; (d) lung cancer patients who had previously undergone pharmacological or radiotherapeutic oncological treatment prior to presenting at our hospital.
Screening and diagnostic protocol for PE in lung cancer patients was as follows: suspected cases aged <50 years with a simplified Wells score ≥2 and D-dimer >0.5 mg/L, or cases aged ≥50 years with a simplified Wells score ≥2 and D-dimer > age × 0.01 mg/L, 9 underwent spiral CT pulmonary angiography or ventilation/perfusion scintigraphy, interpreted independently by at least two senior radiologists. PE was diagnosed according to established imaging criteria. 11 All patients received oncological treatment in accordance with oncology guidelines,12,13 prophylactic anticoagulation when indicated, and regular PE risk assessment and imaging in accordance with PE, VTE, or cancer-associated VTE guidelines. 14 Final management decisions were made by attending physicians. Ethics approval (XHEC-D-2024-073) and consent procedures are described in the Declarations.
Data Collection
In the retrospective cohort, 105 PE-related variables covering patient, cancer, and treatment factors at lung cancer diagnosis were extracted from the electronic medical record system (see
In the prospective cohort, general characteristics (age, sex, body mass index), variables for the POLE model and Khorana score, and pulmonary vascular imaging outcomes were collected from the same system.
Both cohorts applied identical inclusion/exclusion criteria, PE diagnostic procedures, and data extraction methods to ensure methodological consistency and minimize bias.
Statistical Analyses
Statistical analyses were performed using SPSS 26.0 and R 4.3.2. Variables with <20% missing data were imputed using the “mice” package; those with ≥20% missing data were excluded. 15 Normally distributed continuous variables were expressed as mean ± standard deviation and compared with t-tests; non-normally distributed variables as median (25th percentile, P25; 75th percentile, P75) and compared with the Mann-Whitney U-test; categorical variables as counts (n) and percentages (%) and compared with chi-square or Fisher's exact tests.
In the training cohort, potential PE-related risk factors were screened by univariable Cox regression (P < 0.05), and further reduced by least absolute shrinkage and selection operator (LASSO) regression. The final predictors were analyzed using multivariable Cox regression. 16 In the Cox regression analyses, continuous variables were entered as linear terms to keep the model parsimonious, with proportional hazards and a log-linear functional form prespecified. The final predictive model was selected by minimizing the Akaike information criterion (AIC), 17 and a corresponding nomogram was constructed. Risk scores were calculated to stratify patients into high- and low-risk groups. Model performance was assessed with time-dependent ROC curves, area under the curve (AUC), and calibration plots; internal validation used 1000 bootstrap resamples. Time-dependent cumulative incidence curves compared PE occurrence between risk groups.
In the validation cohort, the new predictive model was compared with the Khorana score using ROC curve analysis and the DeLong test. Based on a meta-analysis,
5
the overall incidence rate of PE in lung cancer patients is 3.7%. With a permissible error margin of 0.05, at least 55 patients needed to be included in the validation group. The actual validation cohort included 140 patients, exceeding the minimum sample size needed to evaluate the model. We acknowledge that the 5% absolute error used in the original calculation may be relatively large compared with the event rate; however, the sample size was primarily determined by the number of available patients and the need to ensure a sufficient number of events for model validation. A P value of less than 0.05 was considered statistically significant. (Additional details regarding model construction and statistical analyses are provided in
Results
Demographics and Characteristics of Patients
According to the inclusion and exclusion criteria, 616 lung cancer patients were recruited. The process is shown in the flowchart (

Flow of patients from enrollment.
Baseline Characteristics of the Retrospective Cohort.
Abbreviation: PE: pulmonary embolism; AF: atrial fibrillation; CAD: coronary artery disease; HF: heart failure; TNM: tumor, nodes and metastases.
Baseline Characteristics of the Prospective Cohort.
Abbreviation: PE: pulmonary embolism.
Development of POLE Model
In the retrospective cohort, 20 potential predictors (P < 0.05) of PE were initially identified using univariable Cox regression (

Variable selection using least absolute shrinkage and selection operator (LASSO) regression. A, Selection of the tuning parameter (λ) in LASSO regression using 10-fold cross-validation via minimum criteria; in this study, the optimal λ was set at 0.044. The partial likelihood binomial deviance is plotted against log(λ). Dotted vertical lines indicate the minimum criteria and one standard error of the minimum criteria at the optimal λ, where variables are selected. B, LASSO coefficient profiles for clinical variables, each plotted against the log(λ) sequence. The dotted vertical line indicates the nonzero coefficients selected via 10-fold cross-validation. Eight variables with nonzero coefficients were selected: activated partial thromboplastin time (APTT), D-dimer, phosphorus, carbohydrate antigen 19-9 (CA19-9), history of lung cancer surgery, targeted therapy, age, and pathology (small cell carcinoma).

The nomogram “POLE” to predict the risk of pulmonary embolism in patients with lung cancer. Draw a vertical straight line from the variable value to the axis labeled “Points”. Then calculate the points for all variables. The total points on the bottom scales that correspond to the 1-, 3-, 6-, and 9-month PE-free probabilities were shown clearly. Abbreviation: APTT: activated partial thromboplastin time; CA19-9: carbohydrate antigen 19-9; PE: pulmonary embolism.
Multivariate Cox Regression Analysis of the Clinical Parameters in the Retrospective Cohort.
Abbreviation: HR: hazard ratio; CI: confidence interval.
Predictive Accuracy of the new Predictive Model for PE in the Retrospective Cohort
The ROC curve analysis demonstrated good predictive accuracy for the model in the retrospective cohort (AUC: 0.776, 95% CI 0.720-0.833, P < 0.001;
The optimal risk score cutoff, determined by the Youden index, was 2.2. Patients were stratified into high- and low-risk groups, and time-dependent cumulative incidence curves showed higher PE incidence in the high-risk group (Log-rank P < 0.001;

Cumulative pulmonary embolism event curves in lung cancer patients by POLE model in retrospective study. Abbreviation: PE: pulmonary embolism.
Predictive Accuracy of the Novel Predictive Model for PE in the Prospective Cohort
The POLE model maintained good predictive accuracy in the prospective cohort (AUC: 0.762, 95% CI 0.678-0.845, P < 0.001) and outperformed the Khorana score (AUC: 0.560, 95% CI 0.443-0.676, P = 0.297; DeLong test P < 0.001;

Accuracy and stability of the lung cancer-associated pulmonary embolism risk predictive model. (A) ROC curves of the POLE model and Khorana score model for predicting pulmonary embolism in patients with lung cancer. (B) ROC curves of the pulmonary embolism prediction model at 1, 3, 6, and 9 months post-diagnosis of lung cancer. (C) AUC distribution of the new prediction model via internal resampling. Abbreviation: AUC: area under the curve; ROC: receiver operating characteristic.
Discussion
In this study, the POLE model was developed to predict PE in lung cancer, including seven variables: APTT, D-dimer, serum phosphorus, CA 19-9, history of lung cancer surgery, targeted therapy, and age. Although the 95% CIs for age and serum phosphorus included 1, these variables were retained based on the AIC criterion to optimize overall model performance. This approach has been widely used in predictive modeling and is supported by previous study. 17 The POLE model demonstrated good predictive accuracy, consistency, and stability in both cohorts and outperformed the Khorana score.
Among the study population, 124 patients (20.1%) developed acute PE, with 90 cases (18.9%) in the retrospective cohort and 34 cases (24.3%) in the prospective cohort during a one-year follow-up. Currently, there is limited data on the annual incidence of PE in lung cancer patients. Previous literature 5 has reported an incidence rate ranging from 1.3% to 23.7%. The incidence of PE in our study aligns with this range. Although follow-up periods were not explicitly stated in previous studies, the first six months after lung cancer diagnosis represent a high-risk period for PE.
The Khorana score, although widely used, shows limited predictive accuracy for lung cancer-specific PE. 18 This limitation may be due to the diverse cancer types and broader outcomes included in its development, reducing its specificity for lung cancer-associated PE. In contrast, our study focuses solely on lung cancer patients, with PE as the only outcome, enhancing specificity. This targeted approach provides clinicians with a more precise tool for identifying high-risk lung cancer patients, aiding early detection and informed clinical decisions.
Another model 19 also predicts PE risk in lung cancer using seven variables: adenocarcinoma, stage III-IV, central venous catheter, chemotherapy, serum albumin, hemoglobin, and D-dimer. Although internally and externally validated, it lacked a comparative control model and employed Logistic regression analysis, which may not fully capture the temporal relationship between lung cancer and PE risk.
Our study found that shortened APTT and elevated D-dimer levels are risk factors for PE in lung cancer patients. Previous studies also reported that shortened APTT is independently associated with increased VTE risk,20,21 consistent with our findings.
Phosphorus is an essential mineral involved in numerous biological processes, 22 primarily regulated by serum calcium concentrations and parathyroid hormone. 23 An animal study 24 found high-phosphorus diet promote lung cancer development and alter Protein Kinase B (AKT) signaling. In our study, serum phosphorus negatively correlated with tumor stage (r = -0.134, P = 0.001), which may indicate that higher tumor burden consumes more serum phosphorus. Our study found that the reduced serum phosphorus was a risk factor for PE development in lung cancer patients, potentially due to the heavier tumor burden associated with advanced tumor stage. 25 Furthermore, another study found a negative correlation between serum phosphorus and plasma levels of tissue plasminogen activator inhibitor-1 Ag (PAI-1Ag) in patients with primary hyperparathyroidism 26 (r = -0.453, P < 0.05), and no correlation was found between serum calcium and PAI-1, indicating a possible link between phosphorus and coagulation in lung cancer, which warrants further investigation.
CA 19-9 is a serum biomarker primarily associated with gastrointestinal cancers, particularly pancreatic cancer. In lung adenocarcinoma, serum CA 19-9 levels correlate positively with lymph node involvement and distant metastatic, 27 and elevated CA 19-9 levels are linked to poorer progression-free and overall survival. 28 Our research suggested that higher CA 19-9 levels were associated with increased PE risk in lung cancer patients. While comparable studies are limited, studies in pancreatic cancer have found a correlation between CA 19-9 and patients’ coagulation function. Elevated CA 19-9 has been associated with thrombosis in pancreatic cancer, 29 and its doubling time may predict VTE occurrence. 30 The potential correlation between CA 19-9 and circulating mucin levels may underlie its association with thrombosis. 31 Moreover, a robust correlation has been discovered between plasma microparticle-associated tissue factor activity and CA 19-9 levels, 32 as well as an increase in CA 19-9 correlating with elevated plasma thrombin concentrations. 33 These findings may help clarify how CA 19-9 contributes to the risk of PE in lung cancer patients. However, further investigation is needed to confirm these hypotheses and elucidate the mechanisms by which CA 19-9 affects the coagulation system in this population.
Our study found that targeted therapy and surgical treatment were risk factors influencing the occurrence of PE in lung cancer patients. The targeted therapy agents in our study were mainly targeted at genetic alterations such as epidermal growth factor receptor (EGFR) mutations, anaplastic lymphoma kinase (ALK) fusions, and c-ros oncogene 1 (ROS1) fusion mutations. Previous study has found that non-small cell lung cancer patients with ALK/ROS1 rearrangements are more prone to thrombosis than those with other oncogenic gene alterations. 34 However, we could not perform a stratified analysis due to the small sample size of patients with ALK/ROS1 rearrangements, which may limit the reliability of our results. Nonetheless, our study suggested that targeted therapy may reduce PE risk, likely due to its tumor suppression. Further large-sample studies are needed to verify whether specific gene mutations and their corresponding targeted therapies can reduce PE risk. Results on the impact of lung cancer surgery on venous thrombosis are inconsistent.5,35 Our study suggested that surgery may reduce PE risk, possibly due to the earlier tumor stage and better health in those who underwent surgery. We found no significant effect of chemotherapy, immunotherapy, or anti-angiogenic therapy on PE development in lung cancer patients, in contrast to some previous literature.36–38 This discrepancy may be attributed to integrated therapies in our patient cohort, which could complicate the relationship between chemotherapy and PE risk. Furthermore, our study involved a limited number of patients treated with anti-angiogenic agents and immunotherapies, potentially impacting the study's outcomes. Thus, additional large-scale, prospective studies are needed to confirm these findings and explore the relationship between anti-angiogenic therapy and PE risk in lung cancer.
Our study identified age as a risk factor for PE in lung cancer patients, with those older than 66 years being more susceptible(AUC: 0.605, sensitivity: 0.567, specificity: 0.601, 95% CI: 0.540-0.670, P = 0.002).This aligns with a previous study 39 that also found age over 66 years to be a potential risk factor for PE.
Although the mechanisms by which predictors such as serum phosphorus and CA 19-9 contribute to PE remain unclear, the POLE model's prospective validation and real-world data support its reliability and clinical applicability, highlighting its potential to guide risk stratification and individualized management of PE in lung cancer patients.
Limitations
The POLE model is one of the few specific models for predicting cancer-associated PE risk in lung cancer patients, offering clinical value. However, several limitations should be considered when interpreting these findings: First, the single-center design and modest sample size may limit the generalizability and application of the model. While the model is validated in our prospective cohort, future research should involve multicenter, prospective studies with larger cohorts for further validation and refinement. Second, although the retrospective and prospective cohorts were collected during different time periods, both adhered to the same inclusion and exclusion criteria, diagnostic definitions, and data collection procedures, which minimized potential bias arising from methodological differences. Nevertheless, we cannot completely exclude the influence of temporal changes in lung cancer treatment strategies or clinical practice patterns on the incidence of PE, which might have affected the model's performance in the validation cohort. In addition, early deaths and patients lost to follow-up without documented PE events were censored at the date of death or last follow-up, whichever came first, after which no further person-time was contributed. We acknowledge that this approach may introduce potential bias in estimating PE risk. Competing risk analysis, such as Fine-Gray models, could more accurately account for early deaths as competing events, and future studies will explore this approach to further validate and refine the POLE model. Third, the predictors in the POLE model were assessed at the time of lung cancer diagnosis, making it most applicable to newly diagnosed patients. Caution is needed when applying the model to patients who are not newly diagnosed. Additionally, transient risk factors for PE after diagnosis, such as trauma, infection, and new treatments, could impact the model's long-term predictive accuracy. Fourth, despite efforts to include all relevant risk factors for PE, some important factors may still be missed, which might influence its predictive accuracy. Moreover, we prespecified a linear functional form for continuous covariates to keep the model parsimonious; non-linear terms or fractional polynomials were not examined and could be explored in future work. Last, in the absence of a widely recognized model specific for evaluating PE risk in lung cancer patients, the Khorana score was selected as a control. However, it may not be an eligible control since Khorana score was designed for predicting VTE risk in all categories of cancers instead of predicting PE risk in isolated lung cancer, which may limit its specialized generalizability to the latter prediction.
Conclusions
The POLE nomogram incorporates seven predictors: APTT, D-dimer, serum phosphorus, CA 19-9, history of lung cancer surgery, targeted therapy, and age, was constructed and validated in the current study, for predicting cancer-associated PE development in lung cancer patients. It demonstrates good predictive accuracy, consistency, and stability, as well as the superiority to Khorana score. Nevertheless, the new model still needs to be validated in larger cohorts in the future.
Supplemental Material
sj-docx-1-cat-10.1177_10760296261428826 - Supplemental material for POLE: Development and Validation of a Pulmonary Embolism Prediction Model in Lung Cancer
Supplemental material, sj-docx-1-cat-10.1177_10760296261428826 for POLE: Development and Validation of a Pulmonary Embolism Prediction Model in Lung Cancer by Dongmei Wang, Wei Xiong, Xuan Huang, Fan Zhang, Fengming Xu and Fengfeng Han in Clinical and Applied Thrombosis/Hemostasis
Footnotes
List of Abbreviations
Acknowledgements
We would like to thank all the patients being studied and those who have supported this research indirectly.
Ethics Approval and Consent to Participate
Ethical approval to report this case was obtained from the local institutional review board (approval number XHEC-D-2024-073). Written informed consent from the participants or their next of kin in the retrospective cohort was waived due to: (1) The study involves no more than minimal risk for patients; (2) the study cannot adversely affect the rights and welfare of patients; (3) the study cannot be performed without the exemption of informed consent of patients. Written informed consent was obtained from the patient(s) in the prospective cohort for their anonymized information to be published in this article.
Consent for Publication
Not applicable.
Author Contributions
Concept and design: FH, WX. Acquisition, analysis, or interpretation of data: all authors. Drafting of the manuscript: DW. Critical review of the manuscript for important intellectual content: all authors. Statistical analysis: all authors. Administrative, technical, or material support: all authors. Supervision: FH, WX. Dongmei Wang and Wei Xiong contributed equally to the work.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability and Materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
