Abstract
Introduction
Early diagnosis of non-keratinizing nasopharyngeal carcinoma (NK-NPC) is a significant clinical challenge. This study assessed combined antibodies and built a nomogram for more accurate NK-NPC screening.
Methods
Clinical data of 1330 individuals at high risk of nasopharyngeal carcinoma (NPC) from June 2021 to December 2024 were collected retrospectively. They were randomly divided into a training set (n = 930) and a validation set (n = 400) at a ratio of 7:3. The training set was further divided into the NK-NPC group and the non-NK-NPC group. Univariate and multivariate analyses were used to screen for risk factors of cancer, based on which a risk prediction nomogram model was constructed. The predictive performance of the model was evaluated using indicators such as the area under the receiver operating characteristic curve (AUC), integrated discrimination improvement (IDI), decision curve analysis (DCA), and Youden index. Additionally, an external validation set (cases from January–May 2025 at the same hospital) further assessed the model.
Results
Sex, EBNA1-IgA, VCA-IgA, and Rta-IgG were independent risk factors for NK-NPC in high-risk populations (P < 0.05). The validation results of the nomogram model constructed based on the above factors showed that the AUC values of the receiver operating characteristic (ROC) curves in the training set and validation set were 0.898 and 0.963. Decision curve analysis showed that the net benefit value of this model was higher than that of the traditional model within the threshold probability range of 10% to 60%. The external validation results showed that the sensitivity of the model was 100% and the specificity was 87.8%.
Conclusion
The NK-NPC prediction nomogram model constructed in this study has a high recognition rate and good calibration. It can serve as an effective prediction tool for NK-NPC in high-risk populations of nasopharyngeal carcinoma.
Plain Language Summary
This study constructs and validates a nomogram for predicting non-keratinizing nasopharyngeal carcinoma (NK-NPC) in high-risk populations, based on gender, EBNA1-IgA, VCA-IgA, and Rta-IgG. Retrospective analysis of 1330 high-risk individuals (June 2021–December 2024) identified these 4 factors as independent risk factors. The model shows high discriminative ability (AUC 0.898 in training set, 0.963 in validation set) and good calibration. External validation (January–May 2025) confirms 100% sensitivity and 87.8% specificity. It integrates accessible factors, is easy to operate, and serves as an effective predictive tool for NK-NPC in high-risk populations, aiding clinical practice.
Introduction
The global Incidence of Nasopharyngeal Carcinoma (NPC) varies significantly, with higher prevalence in Southeast China, Southeast Asia, northeastern India, and North Africa.1,2 NPC originates from the epithelial cells of the posterior nasopharynx and typically shows evidence of squamous differentiation. Currently, the World Health Organization (WHO) classifies NPC into 3 major histological subtypes: non-keratinizing squamous cell nasopharyngeal carcinoma (NK-NPC), keratinizing squamous cell nasopharyngeal carcinoma (K-NPC), and basaloid squamous cell nasopharyngeal carcinoma. 3 In China, NK-NPC is the most common subtype, while the other types are relatively rare. NPC is characterized by its insidious onset and non-specific early symptoms, resulting in the majority of patients being diagnosed at an advanced stage, missing the optimal treatment window. Despite recent improvements in treatment modalities, particularly the application of intensity-modulated radiation therapy, patients with advanced NPC often have a low quality of life and suffer from severe treatment-related complications, such as difficulty eating, dry mouth, and other oral-related side effects. 4
Keratinizing squamous cell nasopharyngeal carcinoma is potentially associated with HPV infection, while non-keratinizing squamous cell nasopharyngeal carcinoma is associated with Epstein-Barr virus (EBV) infection. 5 EBV infection can stimulate the body to produce various non-neutralizing antibodies, including viral capsid antigen IgA antibodies (VCA-IgA), nuclear antigen 1 IgA antibodies (EBNA1-IgA), and replication and transcription activator IgG antibodies (Rta-IgG), which play an important role in the screening of NPC. Current evidence indicates that detecting EBV-specific antibodies in peripheral blood remains one of the most convenient and rapid methods for nasopharyngeal carcinoma (NPC) screening. However, single-antibody testing is not ideal for NPC detection due to high individual positivity rates (VCA-IgA: 6.51%, VCA-IgG: 70.41%, EBNA-IgG: 63.70%. 6 Combining multiple EBV-related antibodies significantly improves screening performance, particularly the dual-marker strategy of VCA-IgA and EBNA1-IgA, which has demonstrated high sensitivity and specificity. 7
Nevertheless, most prior studies are retrospective case–control designs with limited specificity, sensitivity, and positive predictive value (PPV), and they rarely incorporate critical variables such as age and sex into model development. Prospective studies are notably lacking.
Notably, existing NPC risk prediction models are mostly based on data from all subtypes, failing to optimize prediction indicators specifically for NK-NPC; moreover, the serum markers (eg, Rta-IgG) relied on for early screening of NK-NPC lack validation data specific to this subtype, leading to a high clinical missed diagnosis rate. This critical gap in current research highlights the urgent need for targeted solutions for NK-NPC screening.
To address this gap, the present study recruited high-risk individuals from NPC-endemic regions to evaluate the diagnostic performance of combined VCA-IgA, EBNA1-IgA, and Rta-IgG antibodies. We further constructed a NK-NPC-specific nomogram-based predictive model integrating age and sex, aiming to provide a more accurate and clinically applicable tool for NK-NPC screening, and fill the blank of subtype-specific risk prediction in current NPC research.
Materials and Methods
General Information
This study is a retrospective study, which included high-risk populations of NPC who visited the Affiliated Shunde Hospital of Jinan University from June 2021 to May 2025. Among them, data from June 2021 to December 2024 were used for model construction and internal validation, and data from January to May 2025 were used for external validation. This study was conducted in adherence to the Declaration of Helsinki. The study design was approved by the Medical Ethics Committee of The Affiliated Shunde Hospital of Jinan University (approval number: 202101013; approval date: 3 March 2021). Due to the retrospective study design and the use of deidentified patient information, the requirement for informed patient consent was waived. The reporting of this study conforms to the TRIPOD guidelines. 8
Inclusion Criteria
Individuals meeting any one of the following will be included in this study:
Having a first-degree relative with NPC; positive qualitative test for serum EB-VCA antibodies; a history of secretory otitis media lasting for more than 2 weeks; nasopharyngeal mass suggested by nasal endoscopy.
Exclusion Criteria
Previous history of nasopharyngeal carcinoma (NPC); age≤14 years; severe dysfunction of vital organs (heart, lung, liver, or kidney); patients with immune-related diseases; pregnant or lactating women; patients who did not comply with follow-up examinations.
Data Collection
The researchers collected the baseline information of the patients through the hospital’s electronic health record system. The data included the following: 1. The clinical data of the subjects who met the inclusion and exclusion criteria were collected through the electronic medical record system. The main contents collected included age, sex, family history (history of NPC in first-degree relatives), past medical history (including chronic diseases, history of tumors, etc.), laboratory test results, imaging examination results, and pathological biopsy results. 2. Detection of serum VCA-IgA, EBNA1-IgA, and Rta-IgG levels. Specimen pretreatment: All participants had 5 mL of venous blood collected on the same day. The collected specimens were processed under the following conditions: centrifugation was completed within 15 minutes, with a centrifugation radius of 8 cm and a speed of 3000 r/min. The serum specimens were obtained after centrifugation and stored at 4°C for testing.
The levels of VCA-IgA and EBNA1-IgA in the serum were measured using the Electrochemiluminescence Immunoassay (ECLIA) method, with reagent kits provided by Shenzhen Yhalon Biotechnology Co., Ltd. This method is a semi-quantitative determination, with results expressed as COI (Cut-off Index) values. The levels of Rta-IgG in the serum were measured using the Enzyme-Linked Immunosorbent Assay (ELISA) method, with reagent kits provided by Tongxin Biotechnology (Beijing) Co., Ltd. The ELISA test was performed according to the instructions in the kit, with results being qualitative (positive or negative). 3. Participants underwent high-definition nasal endoscopy or MRI, and were assessed by 2 otorhinolaryngologists with over 10 years of experience. If nasopharyngeal malignancy was suspected, biopsy was performed for pathological confirmation.
Statistical Methods
Continuous data conforming to normal distribution were expressed as mean ± standard deviation (x̄ ± s), and comparisons between 2 groups were performed using independent samples t-test. For those not conforming to normal distribution, data were described as median (interquartile range) M (Q1, Q3), and inter-group comparisons were conducted with Wilcoxon rank-sum test. Categorical data were presented as counts or percentages, and analyzed using the χ2 test or Fisher’s exact probability test. Independent predictors for nasopharyngeal carcinoma patients were identified through univariate and multivariate Logistic regression analyses, based on which a nomogram model was constructed. The predictive performance of the model was evaluated by plotting receiver operating characteristic (ROC) curves and calibration curves, while the clinical utility was assessed via decision curve analysis (DCA) analysis. External validation was used to evaluate the generalization ability of the model. Statistical significance was considered at a two-sided 5% level. All statistical analyses were performed using R software version 4.4.3.
Results
Population Characteristics
According to the inclusion criteria, a total of 1444 cases were initially screened in this study. After further screening based on the exclusion criteria, 1330 eligible cases were finally included. These cases were divided into an NK-NPC group and a non-NK-NPC group. The latter comprised patients in whom NK-NPC had been excluded and who instead had conditions such as nasopharyngitis, adenoid remnants, nasopharyngeal cysts, etc. Among them, there were 64 cases in the NK-NPC group, accounting for 4.81%, and 1266 cases in the non-NK-NPC group, accounting for 95.19% (Figure 1). In the NK-NPC group, according to the 8th edition of the AJCC staging criteria, there was 1 case in stage 0, 12 cases in stage Ⅰ, 11 cases in stage Ⅱ, 38 cases in stage Ⅲ, and 2 cases in stage Ⅳ. Statistically significant differences were observed between the 2 groups in terms of age, sex, VCA, EBNA1, and Rta (P < 0.05) (Table 1). All data were randomly divided into a training set (930 cases) and an internal validation set (400 cases).The flowchart of the selection process for the participants is shown in Figure 1, and participant characteristics are shown in Table 1. Flowchart of the Selection Process for This Research Subject; NK-NPC, Non-keratinizing Squamous Cell Nasopharyngeal Carcinoma General Information of Participants With NK-NPC and Non-NK-NPC (Jun. 2021 – Dec. 2024) Abbreviations: NK-NPC, non-keratinizing squamous cell nasopharyngeal carcinoma; VCA-IgA, EBV viral capsid antigen IgA antibodies; EBNA1-IgA, EBV nuclear antigen 1 IgA antibodies; Rta-IgG, EBV replication and transcription activator IgG antibodies; SD, standard deviation; M, median; Q1, 1st quartile; Q3, 3rd quartile.
Univariate and Multivariate Logistic Regression Analysis
Results of Univariate and Multivariate Logistic Regression
OR: Odds Ratio, CI: Confidence Interval, VCA-IgA:viral capsid antigen IgA antibodies, EBNA1-IgA: Epstein-Barr virus nuclear antigen 1 IgA antibodies.
Construction of the Nomogram
Based on the 4 independent risk factors identified by multivariate logistic regression, a nomogram model was established to predict the risk of NK-NPC in high-risk populations of nasopharyngeal carcinoma (Figure 2). The Nomogram was Constructed to Predict the Probability of Non-keratinizing Nasopharyngeal Carcinoma (NK-NPC) in High-Risk Populations by Incorporating Four Independent Predictors: Sex, EBNA1-IgA, VCA-IgA, and Rta-IgG
Nomogram Model Validation and Evaluation
The nomogram model achieved an AUC of 0.898 (95% CI: 0.840-0.955). Compared with the traditional model constructed using VCA-IgA and EBNA1-IgA(AUC 0.941, 95% CI: 0.922-0.961), there was no statistically significant difference between the ROC curves of the 2 training sets (Z = −1.497, P-value = 0.1344) or the 2 validation sets (Z = −0.14911, P-value = 0.8815) (Figure 3A and B). Verification of the Nomogram Prediction Model. A: ROC of the nomogram and traditional model for the training set. B: ROC of the nomogram and traditional model for the validation set. C: Calibration curve of the nomogram model for the training set. D: Calibration curve of the nomogram model for the validation set. E: DCA curve of the nomogram and traditional model for the training set. F. DCA curve of the nomogram and traditional model for the validation set. The nomogram was constructed based on 4 factors: sex, EBNA1-IgA, VCA-IgA, Rta-IgG. The traditional model was constructed based on 2 factors: EBNA1-IgA, VCA-IgA.
A: ROC of the nomogram and traditional model for the training set. B: ROC of the nomogram and traditional model for the validation set. C: Calibration curve of the nomogram model for the training set. D: Calibration curve of the nomogram model for the validation set. E: DCA curve of the nomogram and traditional model for the training set. F. DCA curve of the nomogram and traditional model for the validation set. The nomogram was constructed based on 4 factors: sex, EBNA1-IgA, VCA-IgA, Rta-IgG. The traditional model was constructed based on 2 factors: EBNA1-IgA, VCA-IgA.
However, the nomogram model showed excellent discrimination for the risk of NPC. Integrated Discrimination Improvement (IDI) analysis indicated that the addition of sex and Rta-IgG to the new model resulted in a significant overall improvement in predicted probabilities (training set IDI 0.170, 95% CI: 0.104-0.236, P < 0.001; validation set IDI 0.134, 95% CI: 0.052-0.216, P < 0.05).
The calibration plot demonstrated good agreement between predicted probabilities and observed outcomes in the internal validation set (Hosmer–Lemeshow χ2 = 9.75, P = 0.37).
As expected, the training set showed significant calibration statistics (Hosmer–Lemeshow χ2 = 31.44, P = 0.002) because of over-fitting. ., Additionally, the corresponding Brier scores were 0.034 and 0.031, indicating a high degree of consistency between the predicted probabilities and actual outcomes. These results suggest that the model was well-calibrated. (Figure 3C and D).
Decision curve analysis (DCA) analysis showed that, across the entire threshold-probability range of 10 %–60 %, the nomogram model consistently delivered a higher net benefit (NB) than both the conventional model and the treat-all or treat-none strategies, indicating that nomogram model provides clinically meaningful decision support (Figure 3E and F).
External Validation
General Information of Participants With NK-NPC and Non-NK-NPC (Jan. 2021 – May 2025)
Abbreviations: NK-NPC, non-keratinizing squamous cell nasopharyngeal carcinoma; VAC-IgA, EBV viral capsid antigen IgA antibodies; EBNA1-IgA, EBV nuclear antigen 1 IgA antibodies; Rta-IgG, EBV replication and transcription activator IgG antibodies; SD, standard deviation; M, median; Q1, 1st quartile; Q3, 3rd quartile. Z: Mann-Whitney test, χ2: Chi-square test.
Across the training, internal-validation and external-validation cohorts, the nomogram exhibited stable, high discriminative performance. In the external-validation cohort, the model achieved a sensitivity of 100 %, specificity of 87.8 %, Youden index of 0.878 and overall accuracy of 88.5 %; the negative predictive value was 100 % (95 % CI 99.3-100 %), indicating excellent rule-out capability.
Optimal probability thresholds were determined from the maximum Youden index in the training set (nomogram 0.024; VCA-IgA + EBNA1-IgA 0.040). In the external-validation dataset, the nomogram significantly outperformed the conventional logistic-regression model, exhibiting superior sensitivity and accuracy for identifying NK-NPC (Figure 4 and Table 4). Confusion Matrices for External validation. A: Confusion Matrix of the nomogram; B: Confusion Matrix of the Traditional Model (VCA-IgA + EBNA1-IgA) Performance of the Two Models on Different Datasets Abbreviations: PPV, Positive Predictive Value; NPV: Negative Predictive Value; VCA: VCA-IgA; EBNA1: EBNA1-IgA.
Discussion
The southern region of China is a high-incidence area for nasopharyngeal carcinoma (NPC). Due to the insidious symptoms of early-stage NPC, the majority of patients are diagnosed at an advanced stage, with less than 30% of patients presenting with early-stage NPC (AJCC stage I and II). 9 Therefore, developing an effective and cost-efficient screening method to identify NPC patients based on risk factors or tumor markers is of significant importance for residents in southern region of China. 5 Traditional risk prediction models are based on EBV antibody levels using logistic regression models, 10 which do not include important variables such as age and sex. In this study, we constructed a new nomogram model based on sex, EBNA1-IgA, VCA-IgA, and Rta-IgG to clinically predict the occurrence of NPC.
Compared with the traditional model, the nomogram model showed no significant difference in the ROC curves for the validation and test sets—consistent with similar discriminative ability (as reflected by AUC). However, since ROC curves primarily assess the rank order of predicted risks rather than the absolute accuracy of probabilities, the Integrated Discrimination Improvement (IDI) analysis provided a more nuanced evaluation, indicating a substantial improvement in the overall calibration of predicted probabilities. This enhanced performance was further validated by decision curve analysis (DCA), which demonstrated greater clinical utility by quantifying net benefits across practical threshold probabilities.
Previous studies have shown that VCA-IgA, EBNA1-IgA, and Rta-IgG can be used as predictive factors for NPC, either individually or in combination, but they did not include age and sex factors. Numerous studies have demonstrated that age and sex are independent risk factors for NPC. 11 Some scholars have constructed machine learning models for NPC diagnosis prediction based on symptom characteristics and age and sex factors, achieving good results. 12 Additionally, scholars have developed machine learning models for liver cancer diagnosis prediction based on alpha-fetoprotein and age, achieving good diagnostic performance. 13 In the independent predictive factors identified in this study, sex and Rta-IgG were added compared to the traditional model. The results showed that the Integrated Discrimination Improvement index increased, indicating an overall improvement in the predictive ability of the new model, which was also reflected in the decision curve analysis showing enhanced clinical utility. Although age has been identified as an NPC risk factor in prior studies, it was excluded from our final model. This exclusion was primarily due to the small sample size of nasopharyngeal carcinoma not otherwise specified (NK-NPC) in our cohort (n = 64); the limited number of cases were insufficient for logistic regression to detect the true association between age and NK-NPC risk. Consequently, age did not meet the statistical criteria for retention during variable selection.
In both high-incidence and non-high-incidence areas for NPC, the incidence rate in males is 2-3 times that of females,2,11,14 suggesting that sex can be used as a predictive factor for NPC. Furthermore, studies have shown a significant increase in the incidence of NPC in the age groups of 20 to 79 years. 11 However, in this study, age was not identified as an independent predictive factor in the multivariate logistic analysis, which may be due to the relatively small number of positive samples in this study.
In NPC screening methods, the combined detection of multiple EBV-related antibodies is superior to single detection, including Rta-IgG, VCA-IgA, and EBNA1-IgA. 15 The Rta protein is an expression product when the EBV virus BRLF1 is activated and mutated, capable of reversely activating various genes during the latent period of EBV and a series of downstream genes, inducing abnormal cell division and carcinogenesis in nasopharyngeal cells.16-19 Studies have found that Rta is highly expressed only in NPC and is a specific serum marker for NPC. 20 A meta-analysis showed that Rta-IgG has a sensitivity of 82% and specificity of 92% in diagnosing NPC. 21
VCA is a capsid protein on the surface of EBV particles and is a marker of EBV infection and massive viral replication. EBV-VCA-IgA can be detected early in primary EBV infection and is the most widely studied and applied EBV antibody for NPC screening. 22 Due to the strong immunogenicity of the VCA antigen, most NPC patients show VCA-IgA positivity, making EBV-VCA-IgA highly sensitive for diagnosing NPC. 23
Epstein-Barr virus nuclear antigen 1 (EBNA1) is a nuclear antigen expressed during the latent period of EBV, essential for latent EBV DNA replication and cell transformation. It is continuously expressed in epithelial cells of NPC tissue biopsies and is present in all EBV-related tumors. EBNA1 is the only viral-encoded antigen expressed in all EBV-related tumors and is essential for cell transformation. Studies have found that EBNA1-IgA can effectively distinguish NPC patients up to 4 years before diagnosis, indicating its potential as an early detection biomarker. 24
A study from South China showed that combining VCA-IgA and EBNA1-IgA significantly improved NPC screening performance (sensitivity 91.45%, specificity 93.45%, AUC 0.978), outperforming single detection. 25
Whether all individuals at high risk of NPC should undergo nasopharyngeal endoscopy and imaging examinations is a critical issue faced by otolaryngologists, especially in clinical practice in NPC - endemic areas. Obviously, performing imaging examinations on all high - risk populations of NPC will result in high economic costs. The nomogram model constructed in this study shows high sensitivity and specificity in predicting NK - NPC in high - risk populations of NPC, and this performance has been confirmed in both the internal validation set and the external validation set.
It is particularly important to note that this study can only be used to predict NK – NPC and cannot predict other types of NPC, although other types of NPC are extremely rare in NPC - endemic areas. This study has the following limitations: First, the research data are only derived from high - risk populations of NPC and have not been validated in the general population, so their applicability in the general population is unclear, which may have a certain impact on the universality of the results. Second, the external validation cohort included only 12 NK-NPC cases, which may affect the robustness of conclusions on the model’s generalizability; future multi-center studies with larger NK-NPC sample sizes are needed to further verify the model’s applicability. Third, Our model’s predictive performance was enhanced by including Rta-IgG. However, Rta-IgG is not a routine test in most hospitals, with limited standardization and availability. Currently, it should serve as a supplementary marker, used alongside routine indicators (eg, VCA-IgA, nasal endoscopy findings). Future multicenter validation is necessary to unify its testing protocols, verify inter-laboratory consistency, and facilitate its wider clinical use. Finally, the study population of this model is from NPC - endemic areas, and its performance in populations in non - endemic areas needs further verification to improve the versatility of the model.
Conclusion
The nomogram model for predicting NK-NPC constructed in this study, which is based on sex, EBNA1-IgA, VCA-IgA, and Rta-IgG, exhibits high discriminative ability (AUC = 0.898 in the training set and AUC = 0.963 in the validation set) and good calibration. In external validation, it achieves a sensitivity of 100% and a specificity of 87.8%. Compared with traditional models, the nomogram model constructed in this study integrates easily accessible clinical factors and EBV antibody markers, is easy to operate, and is more suitable for promotion and application in primary medical institutions with limited resources. It can serve as an effective predictive tool for NK-NPC in high-risk populations of nasopharyngeal carcinoma, providing a reference for clinical practice.
Footnotes
Acknowledgements
We would like to thank the Department of Clinical Laboratory, The Affiliated Shunde Hospital of Jinan University for their assistance in sample testing and data collection.
Ethical Considerations
This study was conducted in adherence to the Declaration of Helsinki. The study design was approved by the Medical Ethics Committee of The Affiliated Shunde Hospital of Jinan University (approval number: 202101013; approval date: 3 March 2021).
Consent to Participate
Due to the retrospective study design and the use of deidentified patient information, the requirement for informed patient consent was waived.
Author Contributions
Authors’ contributions: FX and YL have complete access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. FX and YL contributed equally to this work. Study conception and design: FX, JL, and HY. Acquisition of data: FX and YL. Analysis and interpretation of data: FX, XZ, HL, and RL. Drafting of manuscript: FX and YL. All authors have read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Foshan City Medical Research Project (NO.:20230024) and Medical Cultivation Specialty Project of the 14th Five-Year Plan in Foshan. It was also funded by The Scientific Research Cultivation Special Fund of The Affiliated Shunde Hospital of Jinan University (202101013). The funders had no role in the study design, data collection and analysis, preparation of manuscript or decision to publish.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data analyzed during the current study are available from the corresponding author on reasonable request.
