Abstract
Deep vein thrombosis (DVT) is one of the common complications after joint replacement, which seriously affects the quality of life of patients. We systematically searched nine databases, a total of eleven studies on prediction models to predict DVT after knee/hip arthroplasty were included, eight prediction models for DVT after knee/hip arthroplasty were chosen and compared. The results of network meta-analysis showed the XGBoost model (SUCRA 100.0%), LASSO (SUCRA 84.8%), ANN (SUCRA 72.1%), SVM (SUCRA 53.0%), ensemble model (SUCRA 40.8%), RF (SUCRA 25.6%), LR (SUCRA 21.8%), GBT (SUCRA 1.1%), and best prediction performance is XGB (SUCRA 100%). Results show that the XGBoost model has the best predictive performance. Our study provides suggestions and directions for future research on the DVT prediction model. In the future, well-designed studies are still needed to validate this model.
Keywords
Background
Patients with knee and hip disorders commonly undergo Total Knee and Total Hip Arthroplasty (TKA and THA) to reduce pain and improve function. 1 It is a common surgical treatment for severe degeneration or injury to the knee and hip. Most patients recover well after surgery and can re-engage in daily activities and sports. However, knee and hip replacement also carries some risks and complications.
Deep vein thrombosis (DVT) is a preventable cause of morbidity and mortality in surgical patients and is considered a major public health and patient safety issue.2,3 DVT is also one of the major burdens of noncommunicable diseases worldwide, with approximately 10 million cases occurring each year, making it the third largest vascular disease after acute myocardial infarction and stroke. 4 DVT may not cause noticeable symptoms, but sometimes symptoms such as swelling, pain, redness, and local temperature rise. If DVT is left untreated, blood clots can break off and reach the lungs through the bloodstream, triggering Pulmonary Embolism, which is a severe complication that can be life-threatening. 5
The incidence of DVT after TKA and THA is high, ranging from 40% to 80% if preventive measures are not taken, and is one of the leading causes of perioperative mortality. 6 DVT is the most common cause of readmission in THA/TKA patients and may occur up to 3 months postoperatively.7,8 Therefore, prevention of DVT is crucial in knee and hip replacement, and it is clinically meaningful to predict whether patients will develop DVT. Predictive models can help healthcare providers optimize decision-making and estimate individual patient risk. 9
Several predictive models have been developed to help doctors assess a patient’s risk of developing DVT after knee/hip replacement surgery.10–20 However, due to the large number of DVT prediction models that exist and the need for comparative studies on the predictive accuracy between different models, it becomes difficult to assess the accuracy of each model. 21 This diversity of model choices creates challenges and inconveniences for clinical staff. Therefore, our study aimed to compare prediction models targeting DVT after knee and hip arthroplasty to identify the most reliable model that would provide a vital guide for clinical practice and assist in clinical decision-making.
It’s crucial to acknowledge that, as of now, there are no universally accepted authoritative DVT prediction models that can serve as definitive guidelines for predicting DVT after knee/hip arthroplasty in clinical practice. There also needs to be a systematic evaluation of the accuracy and model performance comparison of DVT after knee and hip arthroplasty, which makes evidence-based recommendation of the existing risk prediction model. The limited adoption of these models may be attributed to their reliance on small, non-prospective cohorts and the lack of external validation. Furthermore, the methodological quality of these models has yet to be rigorously evaluated, leading to the continuous introduction of new, unresolved issues in the field. To confront these challenges and uncertainties, we conducted a systematic review with the primary objective of identifying all extant predictive models (either developed or validated). We aim to provide a comprehensive analysis, critically assessing their attributes and predictive capabilities. This comprehensive analysis will provide valuable insights for clinical applications.
Methods
This study was conducted in accordance with PRISMA guidelines, 22 encompassing adherence to eligibility criteria, implementation of the search strategy, study selection, data extraction, risk-of-bias assessment, and data analysis.
Search strategy
The PubMed, Cochrane, Embase, Web of Science, China Science and Technology Journal Database (VIP), China National Knowledge Infrastructure Database (CNKI), Wan Fang database, China Biomedical Literature Database (CBM), and IEEE Xplore Digital Library nine databases were comprehensively and systematically searched for articles published in English or Chinese between the period of 1 January 2000 and 15 April 2023. (Supplement S1)
Study selection
Two independent reviewers conducted the study selection process, adhering to predefined inclusion and exclusion criteria. Initially, we imported all potentially relevant studies into EndNote to eliminate duplicate entries. Subsequently, we scrutinized titles and abstracts to exclude studies that did not meet our eligibility criteria. Articles that remained after this screening phase underwent a thorough full-text review. Any disagreements during the selection process were resolved through discussion or, if necessary, consultation with a third reviewer.
Inclusion and exclusion criteria
We incorporated studies related to the creation of prediction models, whether or not they underwent external validation. Additionally, we considered studies focusing on the external validation of models, whether or not model updating was involved. Excluded from consideration were articles solely available as abstracts, those lacking full-text versions, letters, protocols, reviews, and case reports.
Data extraction
Two independent reviewers extracted data from articles based on the CHARMS (critical appraisal and data extraction for systematic reviews of prediction modeling studies) checklist. 23 There were disagreements between two reviewers, and another reviewer assisted in resolving them. The information of the standardized data extraction table from individual studies includes first author, time of publication, country, research type, research object, data source, sample size, number of models, follow-up time, prediction outcome, modeling method, candidate variables, missing data, model performance, predictors included in the model, and model presentation.
Quality and bias assessments
We systematically evaluated the risk of bias and the relevance of the included studies to the review question using the Prediction Model Risk of Bias Assessment Tool (PROBAST). 24 PROBAST assesses bias across four key domains: participants, predictors, outcomes, and analysis, assigning a rating of “high,” “low,” or “unclear” risk of bias to each domain. The outcomes of this bias assessment for each study will be documented in a summary table. Studies determined to be of low quality will be excluded from the meta-analysis.
Statistical analysis
Statistical analyses and plots were performed using Stata 17 software (STATA Corporation, College Station, TX) and RStudio. To evaluate convergence among the included studies, we will employ the Potential Scale Reduction Factor (PSRF). A PSRF value near 1 suggests favorable convergence, allowing for reliable conclusions through the consistency model analysis. We will create ranking probability plots to rank the prediction models. Additionally, Network Evidence plots will be generated using Stata 17 to facilitate the comparison of differences among these models.
The area under the curve (AUC) was used for Comparing the performance of predictive models. Random effects were used and the heterogeneity was assessed visually by means of forest plots and by reporting the I 2 statistic. Pooled AUC and 95% confidence interval were calculated. If heterogeneity is considered significant >70 I 2 , sensitivity analysis was conducted. Funnel plots were used to illustrate the risk of publication bias.
Result
Study selection
The initial literature search was conducted on 15 April 2023, and updated searches were conducted on 2 September 2023. A total of 2491 references were identified. Duplicates (n = 200), case reports, meetings, letters, and irrelevant studies based on their title (n = 2474) were excluded. Finally, 11 studies were included in this systematic review and meta-analysis.10–20 (Figure 1) Flow diagram.
Study characteristics
The articles’ years of publication they were ranged 2020 to 2023. Most studies (n = 9) published were from China and two from the USA. All studies were retrospective cohort studies. Two studies’ target populations were THA, four studies were TKA, and another five studies include both THA and TKA. The number of participants used for developing the models varied from 100 to 392661, and the number of events ranged between 24 and 4042. DVT incidence ranged from 0.5% to 62.63%. Internal validation was undertaken in six studies through split-sample validation (n = 4), cross-validation (n = 2), or bootstrap resamples (n = 3). One study reported externally validated performance measures.
The reported AUCs of the studies ranged from 0.62 to 0.988. Of these, seven models had AUCs in the range of 0.6 to 0.7, indicating poor discriminatory ability. Two models had AUCs in the range of 0.7 to 0.8, indicating moderate discriminatory ability. Six models had AUCs in the range of 0.8 to 0.9, indicating good discriminatory ability. Eight models had AUCs in the range of 0.9 to 1, indicating excellent discriminatory ability.
In total, eight models were reported across the 11 studies, which included Logistic Regression-based models (LR) (n = 7), eXtreme Gradient Boosting (XGBoost) (n = 5), Random Forests (RF) (n = 3), support vector machines (SVM) (n = 2), Artificial Neural Network (ANN) (n = 2), ensemble model (n = 2), Least Absolute Shrinkage and Selection Operator (LASSO) (n = 1), Gradient Boosted Trees (GBT) (n = 1). Detailed characteristics of the included studies were presented in (Supplement S2). The statistical analysis software used in these 11 studies included SPSS, R, and R Studio.
Predictors included in the models
The number of predictors retained in the final models ranged from 2 to 23. The most frequently used predictors were age (n = 8), fracture-fixation types (n = 6), D-dimer level after operation (n = 5), time from injury to operation (n = 4), use of anticoagulant drugs (n = 4), blood transfusion (n = 3), Diabetes (n = 3), active and passive functional exercise (n = 2), bone cement is applied (n = 2), coronary heart disease (n = 2), Gender (n = 2), and smoking (n = 2). The predictors are shown in Figure 2. The number of predictors.
Risk of bias and applicability
All the studies included in this analysis exhibited a pronounced high overall risk of bias based on the PROBAST assessment. In particular, the majority of the developed models were found to possess a high risk of bias within the analysis domain. It’s noteworthy that most studies primarily reported the c-statistic as a performance metric for their models. However, the calibration aspect was either inadequately reported or, in many cases, not reported at all. Only three studies conducted the Hosmer-Lemeshow test to evaluate calibration. These combined issues contributed to the overall high risk of bias rating in the analysis domain for all developed models. Out of the total studies, fourteen received a low risk of bias rating, while six were deemed to have a high risk of bias in terms of applicability. Detailed information regarding the risk of bias and applicability assessments is presented in Figures 3 and 4. Summary results on risk of bias and applicability assessment (PROBAST). Risk of Bias in included studies.

Network meta-analysis
The visual representation of the evidence networks for this study is presented in Figure 5. In the figure, distinct prediction models are denoted by varying dot sizes, and the connecting lines signify direct comparisons between these models. There are eight prediction models in total, with numbers 1-8 corresponding to ANN, Ensemble model, GBT, LASSO, RF, SVM, XGBoost, and LR. These eight prediction models form multiple interconnected loops, facilitating both direct and profile comparisons. (Supplement S3) Network map. Abbreviations:1. ANN, 2. Ensemble model, 3. GBT, 4. LASSO, 5. RF, 6. SVM, 7. XGBoost, 8. LR.
Prediction performance and ranking
We rank the eight prediction models based on the surface under the cumulative ranking (SUCRA). The ordination analysis was performed according to the Bayesian ranking spectrum. XGBoost (SUCRA 100%) is most likely to rank first among all models analyzed, and others included LASSO (SUCRA 84.8%), ANN (SUCRA 72.1%), SVM (SUCRA 53.0%), Ensemble model (SUCRA 40.8%), RF (SUCRA 25.6%), LR (SUCRA 21.8%), GBT (SUCRA 1.1%).
We assessed inconsistency between pairs of prediction models using the node splitting method. The results indicated local inconsistency when p < .05. For network meta-analysis, models reporting the AUC in receiver operating characteristic (ROC) were considered to identify the top-performing model. Higher SUCRA values signify superior model performance, and the findings reveal that XGBoost achieved the highest performance (Figure 6). SUCRA values. Abbreviations: (a) ANN, (b) Ensemble model, (c) GBT, (d) LASSO, (e) RF, (f) SVM, (g) XGBoost, (h) LR.
Subgroup analysis
We performed separate subgroup analysis based on country, study population, and model categorization (Supplement S4). Subgroup analyses showed that the study population (THA) (I 2 = 28.39%) was a source of heterogeneity. Subgroup analyses categorized by country and model type showed little difference.
Publication bias and sensitivity analysis
The funnel plot shows publication bias in studies (Figure 7). The sensitivity analysis (Figure 8) of this study found that the results showed better stability by deleting one by one. Excluding any study did not affect the results, indicating stable results. Funnel plot. Abbreviations: (A) ANN, (B) Ensemble model, (C) GBT, (D) LASSO, (E) RF, (F) SVM, (G) XGBoost, (H) LR. Sensitivity analysis.

Discussion
TKA and THA are the most common surgical procedures performed on the knee and hip. In the U.S., initial THA is projected to increase by 71% and initial TKA by 85% by 2030. 25 DVT is a severe complication after TKA and THA that causes a heavy financial burden on patients and increases mortality. 26 The guidelines recommend that thromboprophylaxis be individualized according to the needs and characteristics of the patient.27–29 Predictive models can help healthcare providers optimize decision-making and accurately estimate individual patient risk. In recent years, several studies have developed prediction models for predictive patients DVT after knee and hip arthroplasty.10–20 However, there still needs to be a recognized and authoritative forecasting model.
It is imperative to pinpoint an appropriate prediction model for patients undergoing knee and hip arthroplasty. In this systematic review, we meticulously examined 11 studies that reported eight prediction models for DVT in knee/hip arthroplasty patients. The majority of these studies originated from China, with the remaining two conducted in the United States. All of the included models exhibited moderate to strong predictive capabilities, featuring AUC or C-index values spanning from 0.62 to 0.988. Our findings highlight the superior predictive performance of the XGB model. However, it’s important to note that all studies were classified as having a high risk of bias, and three of them raised significant concerns regarding applicability, as indicated by the PROBAST tool. Consequently, it is premature to endorse any single model for widespread clinical adoption. Future prediction models for DVT in this context should prioritize methodological quality enhancements.
Several studies10,13,18,20 focused on the XGBoost algorithm. Noam Shohat 10 and colleagues developed and validated predictive models utilizing RF, LASSO, XGBoost, and SVM, revealing crucial predictors such as tranexamic acid, anesthesia type, and prophylaxis type. While their models showed high predictive performance, the methodology’s opaqueness and reliance on some assumptions limited prospective use. Yuhuan Chen and colleagues 13 also employed the XGBoost algorithm, but their study, though high in sensitivity and specificity, had limitations, including a small single-center sample and the absence of hierarchical analysis of factors. Jiali Liu’s study, 18 which used the XGBoost model, similarly showed promise, yet it was restricted by its small sample size and retrospective nature.Other studies11,12,15,16,19,20 relied on traditional logistic regression. Ze Lin’s work, 11 conducted in the Asian population, emphasizes the importance of testing models in diverse populations, including white and black individuals. Xinguang Wang 12 explored a vast number of predictors, enabling automated DVT alerts but with some dependency on clinician and technician expertise. Chen Lv 15 developed a logistic regression model, featuring age, diabetes, serum D-dimer levels, and surgical site as key predictors, with implications for DVT risk assessment after total knee replacement. Shuai Han 16 constructed a nomogram for elderly patients with lower limb fractures, albeit in a retrospective setting with some selectivity bias. Xiaojuan Qin’s study 19 incorporated age, blood glucose levels, and other factors, showing favorable model performance. In Elham Rasouli Dezfouli’s research, 14 a deep neural network model was used, revealing novel risk factors for VTE prediction. This model, however, relied on outdated standards, potentially improving with contemporary ones.
All these studies faced challenges such as small sample sizes, retrospective designs, and a lack of calibration or hierarchical analysis. Their results hold promise but warrant further validation in larger, prospective, and more diverse cohorts. Researchers should also prioritize methodological quality enhancements in future prediction models for DVT in knee and hip arthroplasty patients.
Most of the models needed to be internally validated or used randomized data segmentation inadequately. Only a few models used cross-validation, due to poor or unclear handling of missing data. Only three studies reported results from the Hosmer-Lemeshow goodness-of-fit test. Most studies used a single center, which limited the generalizability of the predictive models developed and validated. Some models had low sample sizes and fewer events per variable, which may have affected the high c-statistics. None of the models were externally validated. Given these shortcomings, there is an increased risk of bias despite the good discriminatory performance of these models.
All prediction models had high RoB scores or applicability issues. The main bias issues were found in the PROBAST guideline areas of analysis. They included small sample sizes, poor handling of missing data, lack of internal or external validation, poor reporting, or lack of calibration measures. Applicability issues were primarily due to highly selected populations. This may be due to the need for more research on modeling and prediction using machine learning models and the lack of reporting norms, thus making it difficult to ensure the quality of the research. 30 There is still some variation between the predictor variables included in different studies, and the type of predictor variable usually has a decisive impact on the results, so it is difficult to eliminate or weaken this heterogeneity due to predictor variables.
Subgroup analyses revealed that heterogeneity within the study population (THA) stemmed from specific factors. This could be attributed to the limited number of prediction models focused on hip arthroplasty, which potentially impacted the precision of the results.
Among the eight prediction models, the XGB model stands out as the top performer with a SUCRA score of 100.0%, while the LASSO model follows closely as the second-best option with a SUCRA score of 84.8%. The XGBoost algorithm has high efficiency and accuracy, and has demonstrated strong performance and high accuracy when applied to the diagnosis of diseases, as well as the analysis of data on the risk of disease occurrence, regression and prognosis, rational and safe use of medication, and drug development.31–34 These rankings align closely with the overall findings. No model can be recommended for widespread clinical use at this time, and future models for predicting DVT after knee and hip arthroplasty should prioritize methodological quality and study design.
This study aimed to systematically evaluate the currently available prediction models for DVT after knee/hip arthroplasty and compare their predictive performance to screen the best prediction models for wide clinical applications. However, the results of our study showed that although most models showed high predictive ability, the quality of the studies current studies is generally low, and subsequent studies should focus on improving the quality of the studies.
Strengths
Our NMA integrates all available data and allows simultaneous direct or indirect comparison and ranking of the predictive models of DVT, improving statistical power and resolution and obtaining more accurate data. This is also the first network meta-analysis to focus on performance comparison and optimal model selection in lower limb venous thrombosis risk prediction models after knee and hip replacement. The main findings of this study suggest that XGBoost is the best model for predicting venous thrombosis in the lower extremities. Secondly, the research published in Chinese and English is within the scope of our study, with no publishing restrictions, and includes all eligible data, thus increasing the study’s strength.
Limitations
Firstly, most of the studies needed more information on the methodological part, which, to some extent, affected the study results. Secondly, the included studies were relatively single, all cohort studies, and the number of included original studies was small. The evaluation index of the differentiation degree of the prediction model is relatively general and has some limitations. Thirdly, limited by the original research, only 11 studies were included in this study, and the number of model comparisons needed to be bigger, which cannot reflect the predictive value of different prediction models. Finally, some undetected bias and heterogeneity may have impacted our primary outcome.
Conclusion
In conclusion, this review shows that machine learning models can accurately predict the risk of DVT in patients after knee/hip arthroplasty, but their quality still needs to be improved. An increasing body of literature has emerged regarding prediction models for DVT in patients following knee/hip arthroplasty, aimed at aiding medical decision-making and strategy development. Nonetheless, our review has illuminated that while existing models exhibit moderate to good predictive capabilities, they suffer from inadequate reporting and a heightened susceptibility to bias, potentially presenting an overly optimistic view of their performance.
Future research should focus on several critical areas for improvement. These encompass refining techniques for managing missing data, thoroughly addressing model calibration, and rigorously validating risk prediction models. It is equally essential for researchers to take into account factors like follow-up duration and the intricacies of study design, as these factors significantly impact the reliability and applicability of predictive models. Moreover, future research should aim to validate existing models in diverse countries externally, evaluate both discrimination and calibration performance, and conduct impact studies to enhance the overall quality of research in this domain.
Supplemental Material
Supplemental Material - Prediction models for deep vein thrombosis after knee/hip arthroplasty: A systematic review and network meta-analysis
Supplemental Material for Prediction models for deep vein thrombosis after knee/hip arthroplasty: A systematic review and network meta-analysis by Qingqing Zeng, Zhuolan Li, Sijie Gui, Jingjing Wu, Caijuan Liu, Ting Wang, Dan Peng and Guqing Zeng in Journal of Orthopaedic Surgery.
Supplemental Material
Supplemental Material - Prediction models for deep vein thrombosis after knee/hip arthroplasty: A systematic review and network meta-analysis
Supplemental Material for Prediction models for deep vein thrombosis after knee/hip arthroplasty: A systematic review and network meta-analysis by Qingqing Zeng, Zhuolan Li, Sijie Gui, Jingjing Wu, Caijuan Liu, Ting Wang, Dan Peng and Guqing Zeng in Journal of Orthopaedic Surgery.
Footnotes
Acknowledgments
We would like to thank the School of Nursing, Hengyang Medical School, University of South China; University of South China-Hunan Lantern Medical Technology Co, Ltd School-Enterprise Cooperative Innovation and Entrepreneurship Education Base, Hengyang, Hunan Province, China.
Author contributions
All authors made substantial contributions to the conception or design of the work: Qingqing Zeng and Zhuolan Li designed this study, screened and selected studies for inclusion, extracted data, performed meta-analyses and drafted the manuscript; the first draft of the manuscript was written by Qingqing Zeng; Sijie Gui screened and selected studies for inclusion, performed meta-analyses and drafted the manuscript; Jingjing Wu, Caijuan Liu, and Ting Wang evaluated the methodological quality of the included studies; Dan Peng and Guqing Zeng assisted with the study design, results confirmation and manuscript composition, as well as offered advanced professional information in relevant disciplines. The final manuscript has been read critically and approved by all the authors critically.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Science Foundation of Hunan Province (Grant number: 2021JJ30586), the Project of Hunan Provincial Department of Finance (Grant number: [2022]44), and the Project of Hunan Provincial Department of Finance (Grant number: [2023]31).
PROSPERO registration
CRD42023418654.
Data availability statement
The data underlying this article will be shared on reasonable request to the corresponding author.
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
