Abstract
Objectives
The aim of this study was to develop and internally validate a hepatocellular carcinoma (HCC) risk prediction model incorporating repeated-measures data (longitudinal model), and compare with baseline predictions.
Methods
A total of 1097 participants with chronic hepatitis C after direct-acting antivirals (DAA) treatment were included in this prospective cohort study. The framework of joint models for longitudinal and survival data was used to construct the longitudinal prediction model. For comparison, a baseline model incorporating the same predictors was constructed through the multivariate Cox regression models. Model performance was evaluated using dynamic discrimination index (DDI), areas under the receiver-operating characteristics curves (AUROC), and Brier scores.
Results
Over a median follow-up of 7.25 years, 60 patients (5.5%) developed HCC. Key risk factors identified were aspartate aminotransferase (AST), cholinesterase, gamma-glutamyl transferase (GGT), albumin, hemoglobin (Hb), platelet count, alpha-fetoprotein (AFP), antigen-125 (CA-125), and carcinoembryonic antigen (CEA). The final joint model, with GGT and CEA removed, showed superior average predictive performance (DDI = .871) compared to models with all predictors included. Validation showed high predictive accuracy for HCC, with AUROCs above .9 for 1-, 3-, 4-, and 5-year predictions. In comparison, the baseline Cox model only achieved mediocre AUROCs of .7 (.75, .67, .69, and .67, respectively).
Conclusion
Compared to static models, our dynamic prediction model can predict the risk of HCC in patients after DAA treatment more accurately, providing better information to distinguish high-risk populations.
Introduction
Hepatitis C virus (HCV) infection is a significant public health concern globally, with approximately 56.8 million viraemic HCV infections at the beginning of 2020 worldwide. 1 HCV infection can demonstrate both acute and chronic disease courses, with approximately 70.00% of individuals developing chronic HCV infection. 2 Hepatocellular carcinoma (HCC) is a critical clinical outcome for patients with chronic HCV infection, causing poor outcomes and heavy health care burden. 3 The development of direct-acting antivirals (DAAs) has altered the scenario of HCV-related HCC and extensive independent studies confirmed that a high proportion of the patients achieved sustained virologic response (SVR) after DAA regime and exhibited a significantly decreased risk of HCC. 4
Although DAA treatment regimens make it possible to achieve HCV elimination, the residual risk of HCC still exists in cases of HCV eradication. 5 Identifying the subsets of post-DAA treatment patients with high HCC risk would be of great help for the clinicians to promote proactive and intensified monitoring and management for them. Several previous studies have identified the risk factors of HCC among individuals with chronic hepatitis C infection (eg, age, liver fibrosis, platelet count, and type 2 diabetes), and developed risk prediction models accordingly. 6 Despite prior endeavors, most of them only include baseline predictors collected at one time-point, which may neglect the dynamic nature of HCC risk among these patients due to the modifications of certain risk factors over time. Longitudinal models with repeated measurements of the potential predictors may be capable of accurately capturing the fluctuated risk of HCC risk among patients who received DAAs. 7 Therefore, the current study aimed to construct a longitudinal predictive model to predict the long-term risk of HCC onset among post-DAA treatment patients with chronic hepatitis C infection and compare the performance of the longitudinal model with a conventional baseline model.
Material and Methods
Patients and Follow-Up
The current prospective cohort study is a part of the “Chronic Hepatitis C Research Program of Jiangsu” (CHCRPJ) project. Participants with chronic hepatitis C were recruited in Jurong People’s Hospital between 1/1/2012 and 31/12/2023. Chronic Hepatitis C can be diagnosed using serology tests (antibody) or molecular (presence of viral RNA or DNA) diagnostics. Participants with missing data on required serum biomarkers were omitted. Participants with prior or baseline (within 6 months from the DAA treatment initiation) HCC events were further excluded. Eventually, a total of 1097 participants were included in this study. The participant inclusion process is briefly demonstrated in Figure 1. Flow chart. HCC, hepatocellular carcinoma.
The index date of the study was defined as the date of receiving DAA regimen. Patients were followed until the study outcome, death or 31/12/2023, whichever came first. The study outcome was new onset of HCC after the index date. Information on HCC occurrence before and after treatment was obtained from hospital inpatient and outpatient diagnoses. Laboratory tests were carried out at the date of receiving DAA regimen and annually after that when patients returned for their follow-up visit.
Chronic hepatitis C was defined as being positive for both anti-HCV and HCV RNA. 8 HCC was diagnosed according to the American Association for the Study of Liver Diseases (AASLD) guidelines. 9
This study was conducted in line with the Declaration of Helsinki. The study protocol was approved by the Institutional Review Board of Nanjing Medical University (Approval No. 95/2014, 445/2017, 528/2019, and 939/2022), and written informed consent was obtained from all the study participants.
Selection of Predictors
The potential predictors utilized in model development were selected based on their availability and associations with HCC as described in previous literature.10-12 The predictors were further classified into 2 categories: baseline predictors and longitudinal predictors.
Baseline factors, including gender, age, use of ribavirin, smoking status, alcohol use, and hypertension, were collected at enrollment and did not change over time. Univariate Cox regression analysis was used to estimate the effects of baseline factors on the risk of HCC occurrence. Only those baseline factors that exhibited statistical significance in the univariate analysis were included in the final survival predictors.
The longitudinal factors, which might change over time, were collected at enrollment and repeatedly measured during follow-up visits. These factors included serum biomarkers such as total bilirubin (TBIL), direct bilirubin (DBIL), alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), cholinesterase, gamma-glutamyl transferase (GGT), albumin, hemoglobin (Hb), platelet count, alpha-fetoprotein (AFP), urea, creatinine, total bile acid (TBA), triglycerides (TG), total cholesterol (T-Chol), high-density lipoprotein-cholesterol (HDL-C), low-density lipoprotein-cholesterol (LDL-C), apolipoprotein A1 (ApoA1), apolipoprotein B (ApoB), carbohydrate antigen-199 (CA-199), carbohydrate antigen-125 (CA-125), and carcinoembryonic antigen (CEA). Joint models were run separately for each longitudinal factor to estimate the impact of their dynamically changing on the risk of HCC occurrence. Only those longitudinal factors found to be statistically significant in the individual joint modelling analysis were included in the final longitudinal predictors.
Patients attended follow-up visits at variable time intervals. At each visit, their serum biomarkers, including the aforementioned longitudinal predictors, were measured. If any measurement from a follow-up visit was missing one of the longitudinal predictors, the entire data from that visit was excluded. Consequently, the time intervals between the repeated measurements for each patient were irregular.
Model Development
Two models, namely longitudinal and baseline models, were constructed to predict the occurrence of HCC among post-DAA patients with chronic hepatitis C. The performance of the above 2 models was further compared. The longitudinal model was constructed by the joint model framework, which consist of 2 linked sub-models: a survival sub-model and a longitudinal sub-model.
Survival sub-model was fitted for survival predictors. To be specific, we started by fitting a multivariate Cox proportional hazards regression model for the baseline predictors, which were found statistically significant in univariate Cox regression analysis. For the longitudinal sub-model, a non-linear mixed-effects model was fitted using the nlme R package, in which AST, GGT, Hb, AFP, CA-125 and CEA were involved (all variables were natural log-transformed). We included the main effect of time (time points that the corresponding longitudinal response were recorded) for the fixed-effects part, and we included an intercept and a time term for the random-effects part.
After having separate sub-models, we jointly modeled the longitudinal responses and time-to-event data under a maximum likelihood approach by using the R package JMbayes2, which fits joint models under a Bayesian approach using Markov chain Monte Carlo algorithms.
Joint modelling supports the inclusion of multiple longitudinal predictors to construct the model, but the model performance may not be optimal when all the longitudinal predictors are included in the joint model. 13 Therefore, we additionally fitted the joint model with one or 2 longitudinal predictors removed and subsequently compared the model performance to screen for the best longitudinal model. A total of 1 + 8 + 28 different longitudinal models were fitted and compared. In addition, AFP has been previously identified as a biomarker of HCC, thus, we also fitted a joint model only containing AFP for comparison. Finally, we developed a baseline model using multivariate Cox proportional hazards regression models based on the same predictors as the best longitudinal model, but utilizing only a single measurement taken at baseline.
Model Performance and Internal Validation
The study participants were randomly split into a training set (70%) and a validation set (30%). Baseline and longitudinal models were developed on the training set and assessed on the validation set. Across our main model, the predictive performance of the models was evaluated in terms of both discrimination and calibration. Discrimination at 1, 2, 3, 4, and 5 years of the baseline and longitudinal models was assessed using time-dependent areas under the receiver-operating characteristics curves (AUROC).14,15 Additionally, we used the dynamic discrimination index (DDI) to compare longitudinal model performance, which summarizes discrimination throughout the follow-up period. 16 We calculated the DDIs with prognostic windows of 1, 2, and 3 years because most HCC occur within this period. Brier scores, which capture both discrimination and calibration, were used as a metric for overall calibration. 14 Brier scores range between 0 and 1, with scores closer to 0 representing higher calibration and better model performance.
The joint model with the highest discrimination (as determined by the highest average DDI across all 3 prognostic windows) was chosen as the final longitudinal model. The prognostic performance of the final joint longitudinal model was then compared to a Cox model with the same predictors and joint model only with AFP using time-dependent AUC values.
Statistical Analysis
The reporting of this study conforms to TRIPOD guidelines. 17
Continuous variables were presented as median (interquartile range, [IQR]), and categorical variables were presented as frequency (percentage). The follow-up time of patients was presented as median (range). All continuous variables were natural log-transformed and included in the model. The averaged trajectory of each longitudinal predictor was estimated using mixed-effects models with random and fixed effects for measurement time. All statistical analyses were two-sided, and statistical significance was set at P < .05.
All data analysis was performed using the statistical packages in R software (version: 4.3.3). 18 The main statistical packages applied in the current study are as follows. Statistical significance was set at P < .05. The joint model was performed using the JMbayes2 package, 19 and the Cox proportional hazards regression models was constructed using the Survival package. 20 The time-dependent AUROC was calculated using the RisksetROC package. 21
Results
Baseline Characteristics of Patients
Baseline Characteristics of Patients by Outcome Grouping.
Continuous variables were presented as median (IQR), and categorical variables were presented as count (percentage). The follow-up time of patients was presented as median (range).
Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; ALP, alkaline phosphatase; GGT, gramma-glutamyl transferase; Hb, hemoglobin; AFP, alpha-fetoprotein; TBA, total bile acid; TG, triglycerides; T. Chol, total cholesterol; HDL-C, high-density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; CA-199, carbohydrate antigen-199; CA-125, carbohydrate antigen-125; CEA, carcinoembryonic antigen.
Incidence of HCC after DAA Treatment in HCV Patients
During a median follow-up of 7.25 years (range, 0.51-12.45), 60 patients (5.5%) developed HCC. HCC incidence rate was 7.22 per 1000 PY during the first year, gradually increasing to 10.94 per 1000 PY in the third year and then decreasing afterward. The cumulative incidence at 1, 2, 3, 4, 5, and 6 years after SVR were 0.3%, 1.4%, 2.5%, 3.5%, 4.0% and 4.5%, respectively. Throughout the follow-up period, HCC occurred as early as .64 years after treatment and as late as 9.9 years after treatment. Figure 2 shows the HCC occurred time and its distribution during the follow-up period. HCC occurred time and its distribution. HCC, hepatocellular carcinoma.
Trajectories of Longitudinal Factors Over Time
A total of 6361 sets of laboratory results were obtained at different time points from the 1097 patients. The longest time span between laboratory results was 12.45 years, during which time the average number of measurements was 5.8 (ranging from 1-40). All the laboratory result values were natural log-transformed to be analyzed as longitudinal factors. To demonstrate evolution of these factors over time, we plotted the individual and mean trajectories of the 6 longitudinal factors for the entire cohort of patients with and without HCC in Figure 3. Trajectories of 6 longitudinal predictors in patients with HCC and without from the entire cohort. The longitudinal predictors were AFP, AST, Platelet Count, Cholinesterase, HGB, and CA125. The grey lines represent individual trajectories of each patient, the blue lines are the averaged trajectories estimated using linear mixed-effects models The values of all predictor variables are on a log scale. Abbreviations: HCC, hepatocellular carcinoma; AFP, alpha-fetoprotein; AST, aspartate aminotransferase; HGB, hemoglobin; CA125, Carbohydrate antigen 125.
Dynamics of Longitudinal Factors Associated With the Risk of HCC
Factors Associated With HCC Risk in Patients in Joint Model.
Continuous variables were log transformed.
HR, hazard ratio; CI, confidence interval of HR; AST, aspartate aminotransferase; GGT, gramma-glutamyl transferase; Hb, hemoglobin; AFP, alpha-fetoprotein; CA-125, carbohydrate antigen-125; CEA, carcinoembryonic antigen.
Feature Reduction
Baseline Factors Associated With HCC Risk in Patients.
HR, hazard ratio; CI, confidence interval of HR.

DDI for 1-year, 2-year and 3-year prognostic windows for longitudinal models with 1 predictor censored or 2 predictors censored. DDI, dynamic discrimination index; GGT, gramma-glutamyl transferase; AFP, alpha-fetoprotein; CEA, carcinoembryonic antigen.
Summary of Final Joint Model.
AST, aspartate aminotransferase; Hb, hemoglobin; AFP, alpha-fetoprotein; CA-125, carbohydrate antigen-125; CEA, carcinoembryonic antigen; Coef, regression coefficient; HR: hazard ratio; CI, 95% confidence interval of HR.
Performance of Prediction Models in the Validation Set
Validation of the models was performed on a random 30% split of the entire study cohort. The characteristics of patients in the 2 groups are displayed in Table 1. The validation set was not included in model development. Three years after enrollment, 255 out of 330 patients in the validation set were still at risk of HCC. In this subset of patients, the longitudinal model showed excellent performance in predicting HCC events that occurred 1 year, 3 year and 4 year, with AUROCs both above .9 (.90, .92, .95 and .93, respectively). For 2-year prediction, the performance of the longitudinal model was good as well, with AUROC .83. Furthermore, the longitudinal model achieved remarkably low brier scores in the 1-, 2-, 3-, 4- and 5- year predictions of HCC (.0138, .0203, .0144, .0099, and .011, respectively).
Comparison of the Performance Characteristics of the Longitudinal and Baseline Cox Models to Predict the Development of HCC.
AUROC, area under the receiver-operating characteristic curve.

Area under the receiver operating characteristic curves value of the baseline model, AFP-only model and longitudinal model for predictions made 1, 2, 3, 4, and 5 years from year 1, year 2, year 3, year 4. Predictions were made at year 1 for HCC occurrence 1, 2, 3,4, and 5 years from year 1, which equals 2, 3, 4, 5, and 6 years from baseline. Predictions were made at year 2 for HCC occurrence 1, 2, 3,4, and 5 years from year 2, which equals 3, 4, 5, 6, and 7 years from baseline. Predictions were made at year 3 for HCC occurrence 1, 2, 3,4, and 5 years from year 3, which equals 4, 5, 6, 7, and 8 years from baseline. Predictions were made at year 4 for HCC occurrence 1, 2, 3,4, and 5 years from year 4, which equals 5, 6, 7, 8, and 9 years from baseline.
Discussion
This study was conducted to build a longitudinal predictive model to evaluate the long-term HCC risk among post-DAA treatment individuals with chronic hepatitis C infection. Three major findings were reported in the current study. Firstly, over the study follow-up period, different trajectories of AFP, AST, platelet count, cholinesterase, Hb, and CA-125 were observed between patients stratified by HCC development status. Secondly, several longitudinal factors were found to be associated with an elevated risk of HCC among post-DAA treatment patients with chronic hepatitis C, encompassing AST, cholinesterase, GGT, Hb, platelet count, AFP, CA-125, and CEA. Lastly, the longitudinal model with 6 factors exhibited better discrimination ability and calibration for identifying long-term HCC risk than the baseline and AFP-only models.
The wide use of the DAA regimen dramatically alters chronic hepatitis C management for its excellent HCV eradication ability, making a large proportion of patients achieving SVR foreseeable.22-24 Despite this, residual risk of HCC occurrence and recurrence still existed among patients with chronic hepatitis C who have achieved SVR.5,24 For example, a recent large cohort study conducted in the U.S. veterans recruited 22,500 patients with hepatitis C (19,518 of them achieved SVR) in which 271 HCC cases still occurred after DAA treatment completion, and HCC developed in 183 patients with SVR during 20,415 person per year follow-up. 25 HCC exhibited the worst 5-year survival probabilities of any cancer, detecting it at an early stage would greatly enhance the survival probabilities. 26 Thus, identifying post-DAA patients with chronic hepatitis C who needed intensified monitoring and management would be of great significance in reducing the risk of HCC onset and the subsequent poor outcome related to HCC. Although several HCC risk prediction models were constructed in previous studies, most of them only select predictive features at baseline alone, neglecting the dynamic modifications of the potential features that may cause the reduction of predictive accuracy and loss of information.6,27,28 Therefore, constructing a risk prediction model with longitudinal features may benefit the prediction accuracy and add more information.
In this study, a total of 8 longitudinal features were identified as risk factors for HCC among post-DAA patients with chronic hepatitis C. Their trajectories were distinct among participants with HCC development and those without, indicating heterogeneous development patterns of these features may be associated with the onset of HCC. 29 Six of them were selected to construct the longitudinal prediction model. The final model exhibited excellent discrimination power for HCC risk as indicated by the high time-dependent AUROC values ranging from .83 to .95 and low Brier scores ranging from .0099 to .0203 within 5-year predictions in the validation set, which outperformed the baseline model. This result further highlights the necessity and importance of including the dynamic change of predictors (ie, predictors with longitudinal information) in the prediction of HCC risk among the target patient population. Of note, among the included longitudinal features, AFP had the most significant influence on the discrimination power of the model as indicated by the alternations of DII in the feature reduction process. AFP is a widely used HCC-screening biomarker whose expression is induced by liver stem/progenitor cells, and an increasing serum AFP level usually indicates the hepatocellular regeneration initiated by severe liver damage.30-32 Given the significance of AFP to HCC risk, we also investigate the discrimination power of AFP alone, and compare it with the baseline and final longitudinal model with all features. Although the AFP-only model exhibited better discrimination ability when compared to the baseline model, its discrimination power was much lower than the final model.
The current study had several strengths. Firstly, the prospective nature of the study design ensures that data on predictors and outcomes were collected systematically and reduces recall bias. Secondly, the study incorporates longitudinal predictors, which allows for a more dynamic and accurate assessment of HCC risk by considering how predictor variables change over time.
However, several inevitable limitations of this study should be noted. Firstly, this study is conducted at a single center in Jiangsu, China, which may limit the generalizability of the findings to other populations or settings. Thus, external validations of the current model are necessary when the results are extrapolated to other populations or settings. Secondly, due to the small number of events, but many variables were considered. Therefore, there is a risk of overfitting.
Conclusion
This study developed and validated a longitudinal predictive model for HCC risk in chronic hepatitis C patients treated with DAAs, demonstrating superior accuracy by incorporating dynamic biomarkers. The model’s enhanced performance highlights the importance of longitudinal data for precise risk prediction. Future research should validate these findings in larger cohorts and explore clinical implementation to improve patient outcomes.
Supplemental Material
Supplemental Material - Dynamic Prediction of the Risk of Hepatocellular Carcinoma After DAA Treatment for Hepatitis C Patients
Supplemental Material for Dynamic Prediction of the Risk of Hepatocellular Carcinoma After DAA Treatment for Hepatitis C Patients by Xinyan Ma, Lili Huang, Meijie Yu, Rui Dong, Yifan Wang, Hongbo Chen, Rongbin Yu, Peng Huang and Jie Wang in Journal of Cancer Control.
Footnotes
Author Contributions
Conceptualization: Xinyan Ma, Jie Wang and Peng Huang; Data curation: Xinyan Ma, Lili Huang, and Meijie Yu; Formal analysis: Xinyan Ma and Lili Huang; Writing – original draft: Xinyan Ma and Rui Dong; Writing – review and editing: Yifan Wang, Rongbin Yu, Hongbo Chen, Peng Huang, and Jie Wang.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was sponsored by the Open Project of Jiangsu Health Development Research Center [grant number JSHD2022046].
Ethical Statement
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
