Sage Journals: Discover world-class research

Abstract

Background:

Chronic rhinosinusitis (CRS) is a prevalent condition frequently evaluated using computed tomography (CT). The application of artificial intelligence (AI) in CRS imaging analysis is expanding; however, comprehensive assessments of its diagnostic performance remain scarce.

Objective:

To systematically review and compare the diagnostic accuracy of AI models for interpreting CT images of CRS, focusing on sensitivity, specificity, accuracy, and area under the curve (AUC).

Methods:

This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Six databases were searched for original studies that assessed AI, machine learning, or deep learning models for CRS diagnosis using CT. Eligible studies reported diagnostic performance metrics for human subjects. The extracted data included the AI model type, validation method, imaging modality, reference standard, and diagnostic outcomes.

Results:

Of the 1502 screened articles, 6 studies involving 2178 patients met the inclusion criteria. Most utilized convolutional neural networks, residual networks (ResNets), or hybrid deep learning models. The sensitivity, specificity, and accuracy ranged from 11.1% to 98.1%, 86.4% to 98.7%, and 63.6% to 98.4%, respectively. AUC values could reach 0.99.

Conclusion:

AI, particularly ResNets-based models, demonstrates promising diagnostic accuracy for CRS CT interpretation. However, methodological heterogeneity limits comparability, underscoring the need for standardized, multicenter validation and integration of clinical data to enhance generalizability.

Plain Language Summary

Chronic rhinosinusitis (CRS) is a long-lasting inflammation of the sinuses that causes nasal congestion, facial pressure, and loss of smell. It is often evaluated using computed tomography (CT), which helps doctors confirm the diagnosis and plan treatment. Artificial intelligence (AI) has recently been applied to CT imaging to improve diagnostic accuracy and reduce interpretation time.

This systematic review analyzed all available studies that assessed the accuracy of AI in diagnosing CRS using CT. Six studies, including 2178 patients, were included. Most deep learning models, particularly convolutional neural networks (CNNs) residual networks (ResNets), were used. The reported diagnostic performance was high with an accuracy of 98% and an area under the curve (AUC) of 0.99.

These results show that AI can reliably assist in the detection of sinus diseases in CT images. However, most studies have been limited to single centers, highlighting the need for larger, standardized, multicenter studies before clinical implementation.

Keywords

artificial intelligence computed tomography chronic rhinosinusitis diagnostic accuracy

Introduction

Chronic rhinosinusitis (CRS) is a common inflammatory disease of the nasal and paranasal sinuses, characterized by symptoms such as nasal obstruction, facial pain, or pressure, and a reduced sense of smell lasting for at least 12 weeks. CRS is estimated to affect approximately 8% to 11% of adults worldwide, although prevalence rates vary across regions.^1
-3 In addition to its high prevalence, CRS poses a substantial clinical and socioeconomic burden, significantly impairing patients’ quality of life and contributing to considerable healthcare expenditures.^3,4

Accurate diagnosis and effective management of CRS typically require a combination of clinical evaluation and imaging, particularly computed tomography (CT) of the paranasal sinuses. CT imaging provides essential anatomical details, facilitates surgical planning, and is critical for evaluating disease severity using standardized systems such as the Lund–Mackay score (LMS).⁵ However, the interpretation of CT scans may vary among clinicians and can be limited by differences in expertise, time constraints, or access to subspecialty consultation.

In recent years, the integration of artificial intelligence (AI), particularly deep learning algorithms, such as convolutional neural networks (CNNs), has shown promise in improving the diagnostic accuracy of medical imaging across various specialties.^6,7 In rhinology, emerging AI applications have primarily focused on automating CT analysis to detect sinus opacification, segment anatomical structures, and predict disease classification. Several studies have reported encouraging performance metrics, including high sensitivity and specificity.^8,9

Despite their growing popularity, there is currently no comprehensive evidence to evaluate the diagnostic performance of AI models for CT imaging of CRS. This represents a critical gap, particularly as AI tools are increasingly being proposed for use in clinical decision support.

This study aimed to systematically evaluate and compare the performance of AI models used to diagnose CRS through CT interpretation, with particular attention paid to diagnostic accuracy, sensitivity, specificity, and area under the curve (AUC). By identifying the most effective approaches and highlighting areas for improvement, this review seeks to inform the future development and clinical implementation of AI in rhinology practice.

Materials and Methods

A systematic review of the published English literature was conducted to evaluate the application of AI models in CRS. This review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines and followed the recommendations outlined in the PRISMA checklist throughout study selection, data extraction, and reporting.¹⁰ The following databases were systematically searched: PubMed (MEDLINE), OpenAIRE, ScienceDirect, Web of Science, Google Scholar, and Springer Nature Journals. A comprehensive electronic search strategy was developed for each database, incorporating both keywords and Medical Subject Heading terms. The search terms included combinations of “artificial intelligence,” “deep learning,” “machine learning,” “computed tomography,” “sinus,” “chronic rhinosinusitis,” and “diagnosis.”

Eligibility Criteria

The articles included in this systematic review were restricted to English-language publications that presented original data. The eligible study designs included prospective and retrospective cohort, cross-sectional, and diagnostic accuracy studies. Randomized controlled trials, although uncommon in this field, were also considered. Two independent reviewers screened all articles for inclusion based on the predefined eligibility criteria. Studies were included if they (1) applied AI, deep learning, or machine learning models; (2) incorporated sinus CT imaging within the diagnostic workflow for CRS; (3) involved human participants; and (4) reported performance metrics such as accuracy, sensitivity, specificity, and AUC. Studies were excluded if they lacked original data; focused on basic science, animal models, or cadaveric studies; used non-CT imaging modalities; or failed to provide sufficient detail on model validation or reference standards. Review articles, expert opinions, abstracts without full text, and studies with unclear methodologies were excluded. Final inclusion criteria were determined following a full-text review after the initial title and abstract screening. Several prior AI-based imaging studies were excluded from the final analysis because they did not report diagnostic accuracy metrics, relied on non-CT modalities, or lacked transparent validation strategies; however, they were referenced for contextual background.

Data Extraction

Data extraction was independently performed by 2 investigators using a standardized data collection form. Any discrepancies related to study inclusion or extracted variables were resolved through discussion and consensus. The extracted data included the publication year, study design, country, sample size, patient demographics, AI model type, imaging modality, and reference standards. Additional methodological details were systematically extracted to improve transparency and comparability across the studies. These included CT data sources (single-center versus multicenter datasets) and model validation strategies. The validation approaches were explicitly categorized as internal (holdout validation or cross-validation within the same dataset) or external (testing on independent datasets). The extracted performance metrics included the sensitivity, specificity, accuracy, and AUC. Variability and incomplete reporting of methodological details across studies were noted and considered during the interpretation of the results.

The methodological quality and risk of bias of the included studies were evaluated using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool, which assesses bias across 4 domains: patient selection, index test, reference standard, and flow and timing. Given the inclusion of AI models, AI-specific considerations were further evaluated according to the QUADAS-AI extension.¹¹

Results

Among the 1502 unique research papers initially identified, 63 were selected for primary screening. After a full-text evaluation, 6 studies comprising 2178 patients were included in the final review (Figure 1). All of the included studies provided level III evidence and were published within the past 2 years. Five studies were conducted in China, and one was conducted in the United States. All included studies relied on internal validation approaches, using either holdout validation or cross-validation, and none performed external validation using independent cohort (Table 1).

Figure 1.

PRISMA flow diagram illustrating the screening process of articles according to the study’s inclusion and exclusion criteria. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 1.

Characteristics of the Included Studies.

Author (year)	Country	Study design	Total population	Age (mean)	Gender, n (%)		Target condition	Validation method
Author (year)	Country	Study design	Total population	Age (mean)	Male	Female	Target condition	Validation method
Raghavan et al (2025)¹²	United States	Retrospective cohort study	543 patients	48 years	248 (45.7)	286 (52.7)	CRS	5-fold cross-validation
Lai et al (2024)¹³	China	Retrospective study	192 Patients	NR	NR	NR	CRSwNP	Holdout validation
Zhang et al (2025)⁹	China	Retrospective study	500 patients	NR	NR	NR	CRS	Holdout validation
Zou et al (2024)¹⁴	China	Retrospective study	500 patients	39 years	327 (65.4)	173 (34.6)	CMS and associated bone remodeling	5-fold Cross-validation
Zou et al (2024)¹⁵	China	Retrospective study	192 patients	NR	NR	NR	eCRS and non-eCRS in patients with CRSwNP	5-fold cross-validation
Du et al (2024)¹⁶	China	Retrospective diagnostic model study	251 Patients	49 years	153 (61.0)	98 (38)	CRSwNP	Holdout validation

Abbreviations: CMS, chronic maxillary sinusitis; CRS, chronic rhinosinusitis; CRSwNP, chronic rhinosinusitis with nasal polyps; eCRS, eosinophilic CRS.

A range of AI models were used across the included studies. CNNs, residual networks (ResNets), and traditional machine learning models are the most frequently adopted. Deep learning approaches include standard deep neural networks (DNNs) and hybrid models, such as 1-dimensional CNNs combined with long short-term memory networks. Regarding data inputs, all studies primarily relied on CT interpretation, while 1 study additionally incorporated ≥2 cardinal symptoms of CRS. Different reference standards were used to validate the performance of the AI model, primarily based on clinical, radiological, or histopathological criteria. Raghavan et al defined the primary reference standard as radiological evidence of sinonasal inflammation with a LMS ≥5, while the secondary endpoint combined LMS ≥5 with at least 2 cardinal symptoms of CRS.¹² Lai et al applied a clinical diagnosis of CRS based on established guidelines, supplemented by CT-based Lund–Mackay scoring.¹³ Zhang et al relied on expert consensus, with diagnoses confirmed by 2 rhinologists using endoscopy and CT imaging.⁹ Du et al employed histopathological examination of nasal polyps as the diagnostic reference.¹⁶ These diverse reference standards reflect heterogeneity in clinical practice and emphasize the need for unified criteria when benchmarking AI performance in sinonasal imaging (Table 2).

Table 2.

Diagnostic Performance of Artificial Intelligence (Uploaded Separately).

Author (year)	AI model	Data input	CT modality	Sensitivity (%)		Specificity (%)		Accuracy (%)		AUC
Author (year)	AI model	Data input	CT modality	Primary endpoint	Secondary endpoint	Primary endpoint	Secondary endpoint	Primary endpoint	Secondary endpoint	Primary endpoint	Secondary endpoint
Raghavan et al (2025)¹²	XGBoost, Random Forest DNN	1. the primary endpoint of sinonasal inflammation on CT evidenced by a LMS ≥ 5. 2. The secondary endpoint of LMS ≥5 and ≥2 cardinal symptoms of CRS.	Sinus CT	38.9	36.4	91.9	97.7	74.5	85.5	71.3	79.8
				11.1	NR	89.2	NR	63.6	NR	68.2	NR
				11.1	54.5	94.6	84.1	67.3	78.2	59.6	80.4
Lai et al (2024)¹³	Multichannel ResNet architecture.	Multi-angle sinus CT images.	Sinus CT	98.1		98.7		98.43		98.4
Zhang et al (2025)⁹	Deep Learning—VGG-based CNN	5000 CT images (1000 each from 4 types of sinusitis + 1000 normal 250 from each subtype)	Sinus CT	Sphenoid 61.21		NR		85.22		NR
Zou et al (2024)¹⁴	CNN SVM for bone remodeling identification	CT images from 500 patients (1000 samples)	Sinus CT	97.96		86.36		92.47		94.0
Zou et al (2024)¹⁵	Multiview fusion deep learning model based on modified deep residual neural network (ResMini architecture)	CT scan data and pathological biopsy results	Sinus CT	NR		NR		96.5		99
Du et al (2024)¹⁶	ResNet-18	Raw CT images (DICOM converted to PNG)	Sinus CT	96.1		96.4		Training ~ 99 Validation ~ 96		96.3

Abbreviations: AI, artificial intelligence; AUC, area under the curve; CNN, convolutional neural network; CRS, chronic rhinosinusitis; CT, computed tomography; DNN, deep neural network; LMS, Lund–Mackay score; ResNet, residual network; SVM, support vector machine; VGG, visual geometry group; XGBoost, extreme gradient boost.

The key metrics used to evaluate the performance of AI models included sensitivity, specificity, accuracy, and AUC. The sensitivity values varied widely among the included studies, ranging from 11.1% to 98.1%. Raghavan et al reported low sensitivity for the primary outcome when using CT alone for diagnosis in both the random forest model and DNN.¹² However, when CT data were combined with the cardinal symptoms of CRS, the sensitivity of the DNN increased by 43.4%. Contrastingly, Lai et al and Zou et al demonstrated substantially higher sensitivities of 98.1% and 97.96%, respectively.^13,14 Du et al reported a sensitivity of 96.1%, whereas 1 study did not provide sensitivity values.¹⁶

The specificity was consistently high across most studies, ranging from 86.36% to 98.7%. Lai et al reported the highest specificity at 98.7%, while Zhang et al described specificity as “very high” without specifying the exact value.^9,13 Two studies, by Zou et al and Du et al, did not explicitly report specificity values.^9,15

The reported accuracy ranged from 63.6% to 98.43%, reflecting variations in model architecture and diagnostic objectives. The lowest accuracy was observed in the study by Raghavan et al (63.6% for the primary endpoint), whereas Lai et al reported the highest accuracy of 98.43%.^12,13 Other studies, including those by Zou et al and Du et al, achieved accuracy levels exceeding 85%. Du et al reported training and validation accuracies of approximately 99% and 96%, respectively.^14
-16

Overall, the AUC values demonstrated a strong discriminative performance. Zou et al reported the highest AUC of 0.99, whereas Raghavan et al documented the lowest AUC of 0.596 for the primary endpoints (Table 2).^12,15

These findings indicate substantial variations in diagnostic performance among AI models applied to sinus imaging, reflecting differences in study design, endpoint definitions, and algorithmic complexity. Although many models, particularly those using deep learning techniques, exhibit high sensitivity, specificity, and AUC values, incomplete or inconsistent reporting across several studies limits the strength of cross-study comparisons.

Importantly, variations in diagnostic performance across studies were closely linked to differences in the model architecture, validation strategies, reference standards, and clinical task definitions. Studies reporting lower sensitivity primarily relied on CT-only inputs for broad CRS diagnosis, whereas models achieving higher AUCs typically address narrower tasks, such as endotype differentiation or bone remodeling assessment, using multiview or attention-enhanced deep learning architectures. These distinctions highlight that the reported performance metrics should be interpreted in the context of the intended diagnostic objective, rather than as a universal measure of model superiority.

Quality Assessment

Most studies demonstrated a low risk of bias across key domains, particularly in patient selection, reference standards, and flow and timing. Raghavan et al, Lai et al, and Du et al showed a consistently low risk across all domains, indicating robust methodological quality.^12,13,16 Contrastingly, Zhang et al and Zou et al presented a higher risk in the patient selection and index test domains, with some unclear reporting in flow and timing.^9,14,15 Applicability concerns were generally low to moderate across all studies, suggesting reasonable relevance to clinical settings, despite some methodological limitations (Table 3).

Table 3.

QUADAS-AI Risk of Bias and Applicability Assessment for Included Studies.

Author (year)	Patient selection	Index test	Reference standard	Flow and timing	Applicability: patient selection	Applicability: index test	Applicability: reference standard
Raghavan et al (2025)¹²	Low	Low	Low	Low	Low concern	Low concern	Low concern
Lai et al (2024)¹³	Low	Moderate	Low	Low	Low concern	Low concern	Low concern
Zhang et al (2025)⁹	High	High	Low	Unclear	Moderate	Moderate	Low
Zou et al (2024)¹⁴	High	Low	Low	Low	Moderate	Moderate	High
Zou et al (2024)¹⁵	High	High	Low	Unclear	Moderate	Moderate	Low
Du et al (2024)¹⁶	Low	Low	Low	Low	Low	Low	Low

Abbreviations: AI, artificial intelligence; QUADAS, Quality Assessment of Diagnostic Accuracy Studies.

Discussion

AI is rapidly transforming the field of medical diagnostics, offering unprecedented accuracy and efficiency in pattern recognition and clinical decision-making. In rhinology, particularly in the context of CRS, AI has demonstrated promising capabilities in interpreting imaging data, classifying disease subtypes, and potentially guiding patient management. The present review synthesizes the current evidence on the diagnostic performance of AI models applied to sinus CT, emphasizing their potential and limitations in clinical application.

The findings from the 6 included studies indicated that deep learning architectures, particularly CNNs and ResNets, achieved high sensitivity, specificity, and diagnostic accuracy for detecting CRS and differentiating its endotypes. Lai et al achieved a sensitivity of 98.1% and specificity of 98.7% using a multichannel ResNets model trained on multiangle CT views for predicting eosinophilic chronic rhinosinusitis with nasal polyps (CRSwNP),¹³ highlighting the importance of architectural depth and multiview input in enhancing diagnostic performance. Similarly, Du et al employed a multiview lightweight attention-enhanced network and reported near-perfect accuracy with an AUC of 0.993, underscoring the scalability of efficient architectures in real-world datasets.¹⁶

Sensitivity values varied considerably among the included studies. While some models demonstrated suboptimal sensitivity, others, particularly those developed by Lai et al and Zou et al, achieved sensitivities of 98.1% and 97.96%, respectively.^13,14 This variation suggests that although AI algorithms perform well in detecting pronounced disease patterns, they may be less reliable in identifying subtle or early-stage CRS manifestations such as mild mucosal thickening or partial sinus opacification.

However, the specificity was consistently high across most studies, ranging from 86.36% to 98.7%, indicating that the models were highly effective at correctly identifying patients without CRS. In other words, these AI systems rarely produce false positive results; when a model classifies a CT scan as normal, it is typically accurate in doing so.

These results align with earlier works by Hashimoto et al and He et al, who emphasized that AI models trained on large, well-annotated imaging datasets can outperform human experts in specific radiologic and pathologic recognition tasks.^17,18 Furthermore, radiomics-based studies, such as those by He et al, have demonstrated excellent discriminative ability in differentiating sinonasal tumors from inflammatory polyps, suggesting that texture-based features, often imperceptible to the human eye, can significantly enhance diagnostic accuracy.¹⁹

Models employing deeper architectures, such as ResNet–based architectures, particularly multiview and attention-enhanced ResNets models, generally demonstrate superior diagnostic performance than conventional CNNs and traditional machine learning approaches. This performance advantage appears to be driven by enhanced feature extraction, deeper network capacity, and enhanced spatial contextualization across multiple sinonasal regions, which are critical for capturing the heterogeneous and often subtle inflammatory patterns of CRS on CT imaging.

More importantly, the reliability of the reported performance metrics must be interpreted by considering the validation strategies employed. All included studies relied exclusively on internal validation methods, such as holdout validation or cross-validation, within single-center datasets, and none performed true external validation using independent multi-institutional cohorts. While internal validation is appropriate for model development and preliminary performance assessment, the absence of external validation limits evaluation of generalizability and may lead to optimistic performance estimates in real-world clinical settings.

From a clinical perspective, these findings indicate that although AI demonstrates promising diagnostic accuracy, its current role should be regarded as supportive and adjunctive rather than autonomous, particularly for specific diagnostic tasks such as endotype differentiation, rather than for standalone confirmation of CRS. This technology may assist clinicians in streamlining image interpretation and reducing diagnostic variability. However, final diagnostic and management decisions must remain under expert supervision to ensure patient safety and appropriate clinical judgment.

Despite these promising findings, several challenges remain. The studies included in this review exhibited considerable methodological heterogeneity, including variations in the data sources, CT acquisition protocols, AI model architectures, validation techniques, and reference standards. For instance, while Lai et al and Zhang et al used CT and clinical criteria as reference standards, Du et al and Zou et al relied on histopathological analysis.^9,13
-15 Such inconsistencies complicate cross-study comparisons and underscore the urgent need for standardized AI validation frameworks in CRS diagnostics.

Furthermore, most of the included studies were retrospective and conducted at single centers, which limited the generalizability of the results (Table 4). Only 1 study was conducted in the United States, whereas the remaining were performed in China, raising concerns about potential geographic and population-specific biases. Prospective multicenter validation is essential before AI models can be widely implemented in clinical practice. Although diagnostic performance metrics, such as AUC and accuracy, have been consistently reported, the issue of interpretability remains a major limitation.

Table 4.

Studies’ Conclusions and Limitations.

Author (year)	Conclusions	Limitations
Raghavan et al (2025)¹²	MLMs accurately predicted patients likely to have CRS using patient-reported data (≥2 cardinal symptoms and LMS ≥5).	The cohort was limited to a single institution, and participants were patients already referred for rhinologic care, likely increasing the proportion of positive cases compared with the general population. The relatively small sample size also raises the risk of overfitting.
Lai et al (2024)¹³	This deep learning-based diagnostic model for CRSwNP endotypes demonstrated excellent classification performance, providing a noninvasive method for accurately predicting CRSwNP endotypes before treatment and supporting precision medicine in CRSwNP management.	External validation was not performed, which may affect the generalizability of the findings.
Zhang et al (2025)⁹	The AI model outperformed physicians of all experience levels in diagnostic accuracy.	The single-institution dataset limits generalizability. The model’s reliance on high-performance computational resources restricts deployment in smaller or resource-limited clinics. Furthermore, the model’s intrinsic resolution constraints and potential CT artifacts may hinder detection of subtle anatomic details relevant to sinusitis diagnosis.
Zou et al (2024)¹⁴	The application of AI methods such as deep learning and machine learning demonstrated feasibility for the automatic identification of CMS.	The CNN and SVM models may not be fully generalizable because of the limited sample size and suboptimal quality optimization. Future studies should include larger datasets and improved model refinement. The study employed a step-by-step deep learning and traditional machine-learning approach.
Zou et al (2024)¹⁵	The multiview deep learning fusion model achieved high performance metrics, demonstrating potential as a noninvasive tool for preoperative endotype differentiation in patients with CRSwNP.	The prediction and assessment methods lacked prospective external data validation, limiting their clinical applicability and hindering broader adoption. Moreover, the relatively low level of data refinement—primarily relying on macroscopic visual and quantitative analysis—further constrains accuracy.
Du et al (2024)¹⁶	The deep learning model developed in this study may provide a novel, noninvasive method for evaluating endotypes in patients with CRSwNP and aid in developing precise treatment strategies.	The study focused solely on patients with CRSwNP, rendering the model unsuitable for CRSsNP cases. Its retrospective, single-center design may introduce bias and limit generalizability. Despite strong performance, model interpretability remains limited, with heatmaps emphasizing bone rather than polyp tissue. Future research should include multicenter data and improve model transparency.

Abbreviations: AI, artificial intelligence; CMS, chronic maxillary sinusitis; CNN, convolutional neural network; CRS, chronic rhinosinusitis; CRSwNP, chronic rhinosinusitis with nasal polyps; CT, computed tomography; LMS, Lund–Mackay score; MLMs, machine learning models.

Future studies should focus on the following key topics. First, the development and validation of AI models should be undertaken through large multicenter studies to ensure generalizability across diverse clinical settings and patient populations. Second, the adoption of standardized methodologies and reporting frameworks is critical for facilitating cross-study comparisons and reproducibility. Finally, integrating CT imaging data with additional patient information, such as symptom profiles, laboratory findings, and clinical history, may further improve diagnostic accuracy and enhance model performance.

In summary, this review reinforces the growing body of evidence that AI, particularly deep learning models, can substantially improve the diagnostic accuracy of CRS when applied to CT imaging. However, to fully realize their clinical potential, concerted efforts are required to standardize methodologies, enhance interpretability, and validate models in diverse prospective clinical environments.

Conclusion

This systematic review demonstrates that AI, particularly deep learning-based models, shows considerable promise for CT-based diagnosis of CRS. Most models achieved high diagnostic performance, with sensitivity, specificity, and accuracy metrics comparable to those of expert interpretations. Nonetheless, the variability in datasets, reference standards, and validation strategies continues to limit the current clinical applicability of these models.

Footnotes

ORCID iD

Noura Farhan Alanazi

Author Contributions

N.F.A. contributed to the work as the first author; conceptualized the idea; designed the study protocol; assembled the research team; created the tables; and reviewed, edited, and proofread the final manuscript. N.F.A. and T.A. performed the literature review, extracted the data, finalized the tables, drafted the manuscript, and edited and finalized the manuscript. N.A. resolved conflicts and reviewed and proofread the final manuscript. A.A. reviewed and proofread the manuscript. All the authors have read and approved the final version of the manuscript. R.A. supervised, reviewed, and proofread the final manuscript. All the authors have read and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

References

Fokkens

Lund

Hopkins

, et al. European position paper on rhinosinusitis and nasal polyps 2020. Rhinology. 2020;58(Suppl S29):1-464. doi:10.4193/Rhin20.600

Min

Lee

Kim

, et al. Global incidence and prevalence of chronic rhinosinusitis: a systematic review. Clin Exp Allergy. 2025;55(1):52-66. doi:10.1111/cea.14592

Hastan

Fokkens

Bachert

, et al. Chronic rhinosinusitis in Europe—an underestimated disease. A GA²LEN study. Allergy. 2011;66(9):1216-1223. doi:10.1111/j.1398-9995.2011.02646.x

Bhattacharyya

Contemporary assessment of the disease burden of sinusitis. Allergy Rhinol (Providence). 2010;1(1):8. doi:10.2500/ajra.2009.23.3355

Lund

Mackay

IS.

Staging in rhinosinusitus. Rhinology. 1993;31(4):183-184.

Hosny

Parmar

Quackenbush

Schwartz

Aerts

HJWL

. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500-510. doi:10.1038/s41568-018-0016-5

Esteva

Robicquet

Ramsundar

, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi:10.1038/s41591-018-0316-z

Fernández-de-Las-Peñas

Von Piekartz

Clinical reasoning for the examination and physical therapy treatment of temporomandibular disorders (TMD): a narrative literature review. J Clin Med. 2020;9(11):3686. doi:10.3390/jcm9113686

Zhang

Wang

Deep learning-based AI model for sinusitis diagnosis. Technol Health Care. 2025;33(4):1800-1817. doi:10.1177/09287329241309799

10.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi:10.1136/bmj.n71

11.

Sounderajah

Ashrafian

Aggarwal

, et al. QUADAS-AI: a tool for assessing the quality of studies on artificial intelligence-centered diagnostic test accuracy. Nat Med. 2021;27(10):1663-1665. doi:10.1038/s41591-021-01517-0

12.

Raghavan

Aboueisha

Prohnitchi

, et al. Using machine learning models to diagnose chronic rhinosinusitis: analysis of pre-treatment patient-generated health data to predict cardinal symptoms and sinonasal inflammation. Am J Rhinol Allergy. 2025;39(3):229-236. doi:10.1177/19458924251322081

13.

Lai

Kang

Chen

, et al. An end-to-end CRSwNP prediction with multichannel ResNet on computed tomography. Int J Biomed Imaging. 2024;2024:4960630. doi:10.1155/2024/4960630

14.

Zou

Cui

, et al. Preliminary study on AI-assisted diagnosis of bone remodeling in chronic maxillary sinusitis. BMC Med Imaging. 2024;24(1):140. doi:10.1186/s12880-024-01316-2

15.

Zou

Lyu

Lin

, et al. A multi-view fusion lightweight network for CRSwNPs prediction on CT images. BMC Med Imaging. 2024;24(1):112. doi:10.1186/s12880-024-01296-3

16.

Kang

Lai

, et al. Deep learning in computed tomography to predict endotype in chronic rhinosinusitis with nasal polyps. BMC Med Imaging. 2024;24(1):25. doi:10.1186/s12880-024-01203-w

17.

Hashimoto

Rosman

Rus

Meireles

OR.

Artificial intelligence in surgery: promises and perils. Ann Surg. 2018;268(1):70-76. doi:10.1097/SLA.0000000000002693

18.

Baxter

Zhou