Abstract
Background:
The rising incidence of thyroid cancer presents a growing diagnostic and therapeutic challenge. Various risk stratification systems have sought to integrate clinical, ultrasonographic, and, in some cases, cytological features to aid malignancy prognostication. This systematic review aims to critically evaluate risk stratification tools (RSTs) for patients with thyroid nodules, which incorporate multimodal inputs to assess their diagnostic performance and clinical utility in supporting surgical decision-making.
Methods:
PubMed, Embase, and Cochrane databases were searched from inception to 04/13/2026, identifying studies evaluating multivariable risk prediction models for adult patients undergoing assessment of thyroid nodules. Studies were excluded if the proposed tool failed to incorporate clinical features, ultrasound findings, and cytology results or was not validated with histology. Data extraction encompassed methodology of model development, performance metrics, and approaches to validation. Risk of bias was assessed using the PROBAST+AI tool.
Results:
Seven studies describing five distinct RSTs met inclusion criteria Thyroid Nodule App (TNAPP), the McGill Thyroid Nodule Score (MTNS), CUT Score, Memorial Sloan Kettering Cancer Centre (MSKCC) nomogram, and Thyroid Prediction Score (TiPS). TiPS demonstrated the highest sensitivity (96.2%) and specificity (97.5%) with area under the curve (AUC) >0.9. The CUT score also showed strong performance (AUC >0.9), particularly in low-to-intermediate risk nodules. TNAPP underperformed (accuracy 50.5%; specificity 27.5%) despite broad clinical inputs. The MTNS and MSKCC, although promising for indeterminate cytology, lacked robust validation. Most models were derived from single-center, retrospective cohorts, limiting generalizability.
Conclusions:
RSTs integrating multimodal data may improve thyroid nodule risk stratification, particularly in cases of indeterminate cytology. However, methodological limitations and lack of external validation currently restrict clinical utility. Prospective evaluation in diverse populations is required to identify the most effective and generalizable tools. Until then, RSTs should be used as adjuncts to, not replacements for, clinical judgment and shared decision-making in thyroid nodule assessment.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
