Abstract
Hepatitis B virus (HBV) causes liver cancer, which is the third most common cause of cancer-related death worldwide. Chronic inflammation via HBV in the host hepatocytes causes hepatocyte remodeling (hepatocyte transformation and immortalization) and hepatocellular carcinoma (HCC). Recognizing cancer stages accurately to optimize early screening and diagnosis is a primary concern in the outlook of HBV-induced hepatocyte remodeling and liver cancer. Genomic signatures play important roles in addressing this issue. Recently, machine learning (ML) models and bioinformatics analysis have become very important in discovering novel genomic signatures for the early diagnosis, treatment, and prognosis of HBV-induced hepatic cell remodeling and HCC. We discuss the recent literature on the ML approach and bioinformatics analysis revealed novel genomic signatures for diagnosing and forecasting HBV-associated hepatocyte remodeling and HCC. Various genomic signatures, including various microRNAs and their associated genes, long noncoding RNAs (lncRNAs), and small nucleolar RNAs (snoRNAs), have been discovered to be involved in the upregulation and downregulation of HBV-HCC. Moreover, these genetic biomarkers also affect different biological processes, such as proliferation, migration, circulation, assault, dissemination, antiapoptosis, mitogenesis, transformation, and angiogenesis in HBV-infected hepatocytes.
Introduction
Hepatitis B virus (HBV)-induced hepatocellular carcinoma (HCC) is responsible for approximately 90% of primary liver cancer cases worldwide. 1 Prospective cohort studies have shown that patients with chronic HBV infection have a nearly 100-fold increased risk of developing HCC.2 -4
There is evidence that the transformed protein products of the HBV accelerate the onset of cancer. When HBV infection occurs, hepatocyte remodeling and the development of HCC are closely linked to the transcriptional activation of differentiation-regulating genes by HBV-encoded proteins. Dysregulation of gene expression can result from the interaction of HBV-encoded proteins with the machinery of the host cell, especially those genes linked to cell differentiation. 5 In other words, comprehending the methods by which proteins encoded by HBV alter the transcription of genes that regulate differentiation is crucial for clarifying persistent HBV infection and hepatocyte transformation.6,7 Important transcription factors, epigenetic changes, and signaling pathways that affect hepatocyte gene expression patterns may interact throughout this process. 8
Moreover, the integration of HBV DNA from hepatocytes into their genomes is a significant step in the pathophysiology of HCC during persistent HBV infection. The incorporation of HBV DNA can result in several molecular and cellular alterations that support the development of HCC and hepatocyte remodeling. 9 It is also imperative to understand how incorporated HBV DNA impacts hepatocyte remodeling and plays a role in the development of HCC to devise tailored treatments that attempt to impede these mechanisms, which may lead to the discovery of biomarkers for early detection in people with persistent HBV infection.10,11
Furthermore, viral DNA sequences have the potential to disrupt genes in host cells during integration. This can result in changes in gene expression patterns, which can encourage cell division, prevent apoptosis, and cause genomic instability. 12 These alterations believed to play a critical role in the development of HCC from chronic HBV infection. 13
Hepatocyte remodeling and the emergence of HBV-associated HCC can also be attributed to the dysregulation of many signaling pathways caused by the integration of HBV DNA into the hepatocyte genome. 14 Hence, integrated HBV DNA has been found to impact multiple signaling pathways, including the PI3K/Akt/mTOR pathway, the Ras/Raf/MEK/ERK pathway, and the Wnt/β-catenin pathway.15,16 These cascades are essential for controlling the survival, differentiation, and proliferation of hepatic cells. Through several different means, the incorporation of HBV DNA can activate these signaling pathways. Viral proteins generated by integrated HBV DNA, for instance, may interact directly with these pathways’ constituent parts or interfere with their regular control. 17 Within infected hepatocytes, this dysregulation may result in prolonged activation of pro-proliferative signals and the prevention of apoptosis. 18
Moreover, integrated HBV DNA may also cause persistent hepatic inflammation, which can activate inflammatory signaling cascades such as the JAK/STAT and NF-κB pathways. A microenvironment that fosters cellular changes and the growth of HCC can be produced by persistent inflammation. In addition, hepatic stellate cell signals are linked to extracellular matrix remodeling and the growth of myofibroblast-like cell types during HBV-associated hepatocyte immortalization. 19
Precise estimation of HBV-induced hepatocyte remodeling and HCC biomarker prediction contributes to the early detection of HBV-related liver cancer and decreases mortality. The direct selection of predictive biomarkers, without the need for subjective preselection, is made possible by machine learning (ML) techniques. 20
As the fields of big data science and statistical approaches have advanced, artificial intelligence (AI) has become increasingly applicable to every facet of human existence. AI focuses on creating analogs of human intelligence, reasoning, and understanding to solve problems, make decisions, and perform tasks. ML, a subset of AI, is a process that involves processing large volumes of data with a specific objective, autonomously grouping the information, and enabling the machine to evaluate it. 21 This is achieved by ingesting data and applying algorithms to develop a predictive model. 22
Currently, ML techniques are frequently employed to identify various medical signatures for patients with HBV-HCC. 23 By converting genetic and molecular data into a streamlined input mechanism, ML algorithms can utilize this information represented at the sequence level and analyze it through various methods. In other words, the ML approach has a complex role in HBV-induced hepatocyte remodeling and HCC in many facets of illness, and more advanced technologies have become available. 24
Because ML techniques can enhance diagnostic precision and predictive power, they have become increasingly popular in the analysis of HBV integration. A variety of ML models, such as support vector machines (SVMs), random forests (RFs), and eXtreme Gradient Boosting (XGBoost), have been utilized to forecast HBV infection and assess its integration into host genomes. Research has demonstrated that ensemble learning approaches, which integrate multiple algorithms, can improve the prediction accuracy and reduce the generalization errors associated with single classifiers. 25
Furthermore, studies have demonstrated the value of ML in identifying key clinical predictors for HBV integration, which is essential for early diagnosis and treatment strategies.25,26 Personalized treatment approaches are made possible by the incorporation of advanced algorithms that not only facilitate the stratification of individuals at risk but also contribute to the understanding of the molecular pathways underlying diseases associated with HBV. 27
Currently, the ML approach provides a thorough grasp of HBV progression through predictive analysis and identifies blood biomarkers that could be utilized as diagnostic, therapeutic, and prognostic signals for patients with HBV-caused hepatocyte remodeling and HCC. 20
Furthermore, a new field of study called bioinformatics has evolved that offers a great means of comprehending biological research, including the diagnostic, therapeutic, and prognostic value of miRNAs and their related genes in HBV-induced hepatic malignancy via the analysis of large-scale biological datasets. 28 Moreover, innovations in bioinformatics and science have recently revolutionized the generation and analysis of massiveomics datasets. This has made it possible to integrate numerous datasets, such as the transcriptome, proteome, glycoproteome, metabolome, epigenome, and genome. 17
Various types of genomic signatures have been discovered via machine learning approaches and bioinformatics analysis to indicate HBV-associated hepatocyte remodeling and HCC. This review summarizes recent advances in understanding the role of these signatures in HBV-induced hepatocyte remodeling and hepatic cancer.
Machine Learning (ML) Approaches Revealed Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and HCC
The identification of genetic biomarkers for HBV-HCC has undergone dramatic transformation, shifting from conventional statistical approaches to sophisticated ML strategies.
29
Traditional statistical analysis techniques, such as ANOVA, Pearson correlation, and
As a result, standard statistical tests are primarily used for exploratory data analysis and preliminary hypothesis testing. While they can identify genes with differential expression, their linear nature may lead to the oversight of complex associations. 31 In contrast, ML techniques offer greater flexibility by effectively managing intricate, non-linear correlations and interactions between variables, making them better suited for analyzing high-dimensional genomic datasets. 32
Depending on the characteristics of the dataset, a variety of methods, including SVM, RF, and XGBoost, can be employed. Each algorithm has advantages; for example, SVM excel in high-dimensional environments, while RF are resistant to overfitting. 33
In addition, naïve Bayes performs classification by assuming the independence of features, while
Moreover, to ensure the robustness of predictions, ML techniques often utilize cross-validation and a variety of performance metrics, including accuracy, sensitivity, and specificity. Recent research has demonstrated that ML methods can identify potential biomarkers with greater accuracy than traditional approaches. For example, LASSO regression combined with other ML techniques has successfully identified diagnostic biomarkers for liver diseases associated with HBV. 37
In other words, evaluating the performance of ML algorithms is essential for ensuring their effectiveness in making predictions, and this evaluation is conducted via various performance measures and validation techniques. The choice of measures depends on the specific problem being addressed, whether it involves classification or regression. 38
Among the key performance measures, accuracy is the simplest; it calculates the ratio of correctly predicted instances to the total instances in a dataset, making it particularly useful for balanced datasets where all classes hold equal importance. Accuracy = total number of predictions/number of correct predictions. However, when false positives carry significant consequences, as in medical diagnoses, precision becomes crucial; it measures the proportion of true positive predictions out of all positive predictions made by the model. 39
Similarly, recall (sensitivity) assesses how well the model identifies all relevant instances, focusing on true positives, which is critical in applications such as cancer detection, where failing to identify a positive instance (false negative) can have severe implications.
40
To balance precision and recall, the
To ensure that performance metrics accurately reflect model performance, a variety of validation techniques are employed. Cross-validation, particularly
ML can analyze multi-genomic data to uncover genotype-phenotype correlations that conventional methods might overlook. Therefore, while traditional statistical methods provide foundational insights into genetic data, ML serves as a powerful alternative that enhances predictive accuracy and reveals complex patterns essential for biomarker discovery in HBV-HCC. 46 Model selection, training, evaluation, and data preprocessing are the core tenets of ML. Data preprocessing ensures high-quality input by cleaning and standardizing genetic data, which is essential for accurate model performance. Data preprocessing consists of several steps in HBV-HCC genomic biomarker discovery, starting with data exploration and analysis to understand the the genomic dataset, followed by handling missing values using techniques like imputation and performing data cleaning to ensure data integrity. 34 In addition, standardizing and normalizing bring uniformity to the data, while handling outliers addresses extreme values. 47 This is followed by data reduction, which simplifies the dataset, and feature selection, which is aimed at identifying significant attributes to enhance model efficiency. Feature Engineering involves experimenting with new features to optimize model performance. 48
Model selection involves choosing the appropriate algorithm, such as decision trees, SVMs, or neural networks, on the basis of the specific characteristics of the genetic data and the desired outcome. 49 During the training phase of deployment, the model is fed labeled genomic data to identify patterns associated with HBV-HCC. This process utilizes optimization techniques to minimize prediction errors. The model’s performance on unseen data is then evaluated using metrics such as accuracy and precision. 50
Additionally, feature engineering is crucial, as it involves selecting and transforming genetic features that significantly impact the model’s predictive capability. Hyperparameter tuning further enhances model performance by adjusting parameters that influence learning dynamics. By following these guidelines, researchers can effectively leverage ML to identify potential biomarkers that may aid in the early detection and management of HBV-related HCC. 34
High-throughput sequencing technologies and well-established databases provide the majority of input data used to derive genomic biomarkers associated with HBV-HCC. 51 Important resources include the Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO), which offer comprehensive gene expression profiles. For instance, RNA-seq data from patients with HBV infection, encompassing both normal and HCC tissue samples, can be found in datasets such as GSE25599, GSE77509, and GSE94660 from GEO. 52 These databases facilitate the identification of differentially expressed genes (DEGs) by comparing gene expression levels across various conditions. Furthermore, multi-omics techniques that integrate data from transcriptomics, proteomics, and genomics enhance our understanding of the molecular pathways underlying HBV-HCC, aiding in the discovery of potential biomarkers for diagnosis and treatment.53,54
Different ML techniques possess distinct characteristics and applications for identifying genetic biomarkers associated with HCC linked to HBV. Supervised learning algorithms, such as SVM and RF, are commonly employed for classification tasks, while unsupervised methods like clustering assist in uncovering patterns within data without predefined labels. Additionally, deep learning techniques, including Convolutional Neural Networks (CNNs), excel at processing complex information, particularly high-dimensional genomic data and image analysis. 55
The range of applications for these techniques includes patient stratification, disease prediction, and biomarker identification. RNA-seq and microarray studies are common sources of high-quality, well-annotated data that are essential for successful implementation. 56 ML models to accurately identify relevant biomarkers and DEGs. Furthermore, feature selection methods are frequently employed to focus on the most informative genetic features, reducing dimensionality and enhancing model performance. 57
Based on genomic data, ML methods have been created to predict the likelihood of developing HCC and hepatocyte remodeling caused by HBV. 58 These models can precisely determine a person’s risk of acquiring HBV-HCC by combining genetic and epigenetic data, which enables early intervention and individualized treatment plans. In other words, patients with HBV-induced HCC and hepatocyte remodeling have new genetic biomarkers that are predictive of prognosis through the use of ML techniques. 59
The ML methods can find genetic signatures for hepatocyte remodeling and HCC associated with HBV from gene expression datasets. 60 ML models can uncover unique biomarkers linked with disease progression and survival outcomes in HBV-HCC patients by examining gene expression profiles and other genomic data. 61 Moreover, ML techniques have completely transformed our knowledge of the genomes of HBV-induced HCC and hepatocyte remodeling, providing the way for the identification of novel molecular targets for the diagnosis, prognosis, and treatment of HBV-HCC. 62
ML techniques have also been applied to the identification of particular genomic alterations linked to HBV-induced HCC and hepatocyte remodeling. 63 Through the examination of extensive genomic data, ML algorithms can detect patterns and signatures specific to HBV-HCC and hepatocyte remodeling, offering a valuable understanding of the molecular processes that underlie the illness. 64
To estimate possible genomic and molecular signatures for patients with HBV-HCC and hepatocyte remodeling from large genomic datasets, the following steps should be followed; first, samples of HBV-infected liver tissue that have developed HCC or undergone hepatocyte remodeling can be used to gather gene expression data. The expression levels of thousands of genes in each sample are usually included in this data. 65
Subsets of genes that are particularly important for differentiating between distinct phenotypes, including normal hepatocytes, modified hepatocytes, and HCC cells, can then be found using feature selection algorithms. 66 With the least amount of duplication among the features that are chosen, these algorithms seek to identify the genes that are most strongly associated with the desired phenotype. 67 In this technique, several feature selection algorithms can be used. In other words, using gene expression data, feature selection algorithms are 1 potential machine learning method to find genetic signatures for hepatocyte remodeling and HCC associated with HBV. 68
Recursive feature elimination (RFE), for instance, is a technique that repeatedly eliminates less significant features until the target number is retained. As an alternative, sparsity in the chosen features can be enforced using L1 regularization techniques like LASSO (Least Absolute Shrinkage and Selection Operator). 69 Following the identification of a subset of relevant genes through feature selection techniques, machine learning models, such as neural networks, SVM, random forests, gradient boosting machines, and k-means clustering, can be trained on this reduced feature set to categorize new samples into distinct phenotypes according to their gene expression profiles. 70
A variety of performance indicators, including accuracy, precision, recall, and area under the receiver operating characteristic curve (AUROC), can then be used to assess the trained model. Additionally, using enrichment analysis techniques like gene ontology or KEGG pathway analysis may involve finding important, additional pathways or biological processes linked to the chosen genomic signatures.53,71
It is conceivable to find genomic signatures predictive of remodeling and HCC development by applying this ML approach to genomic data from HBV-associated liver tissue samples with known phenotypes relevant to these diseases. 34 These signs may function as biomarkers for the advancement of the disease or as targets for therapeutic interventions, and they may also shed light on the underlying molecular mechanisms underlying HBV pathogenesis. To identify genetic signatures associated with the onset of HCC and HBV-associated hepatocyte remodeling, several machine-learning approaches might be investigated. 72 Among the techniques that are commonly used is supervised learning, which involves analyzing labeled data to identify molecular markers associated with the progression of HBV-related disease and training a model to predict particular outcomes. 73 Another type of ML approach, unsupervised learning, can identify distinct molecular signatures or subtypes linked to HBV-caused liver disorders without the need for predefined labels. 74
In addition, deep learning uses techniques such as deep neural networks to analyze enormous amounts of genomic data and find complex relationships and patterns that may shed light on the pathophysiology of HBV.64,75 Transfer learning, when working with sparsely labeled data for diseases associated with HBV, makes use of information gleaned from related activities or datasets to enhance predictive performance, 76 and feature selection and dimensionality reduction methods can be used to find important genomic features related to HBV-associated hepatocyte remodeling and HCC development77,78 (Figure 1).

The analysis of large-scale genomic datasets using a machine learning approach to discover key genomic signatures for HBV-HCC and hepatocyte remodeling.
ML Approaches Revealed microRNAs and Their Related Genes as Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and HCC
The identification of HBV-related HCC is one possible clinical use for the discovery of miRNAs and their target genes. 79 Through the evaluation of particular miRNA levels in patient samples, medical professionals could be better equipped to identify HCC and distinguish it from other liver illnesses. This might result in an early diagnosis and course of treatment, which would benefit the patient. 80 Furthermore, these biomarkers may be useful in prognosticating the course and fate of HBV-related HCC in patients. Clinicians may be better equipped to gage a patient’s response to treatment and the chance of disease recurrence by tracking changes in miRNA levels over time. 81
Moreover, developing focused therapeutic approaches may result from knowledge of the function of particular miRNAs and the genes that they target in HBV-related HCC. 82 For instance, targeted medicines that attempt to alter the expression or activity of specific miRNAs may be investigated as possible therapeutic options if these miRNAs promote tumor growth or metastasis. 83 These miRNAs can be used as targets for innovative therapeutic strategies and can be integrated into the diagnostic and prognostic systems for HBV-related HCC. 84 A large amount of genomic data, including microRNA expression profiles, can be analyzed by ML algorithms to find trends and correlations with the course of disease. Through training these algorithms on datasets comprising data from patients with HBV-HCC and healthy controls, researchers may be able to identify particular microRNAs linked to the onset or course of the disease. 85
A previous study illustrated the use of an ML approach to determine potential miRNA signals from miRNA–mRNA regulatory network modules in HBV-HCC and hepatocyte remodeling via comprehensive analysis. In this study, a regulatory network was built to integrate the regulatory network interactions between miRNAs and mRNAs via the random forest method to rank possible signals. 86 In other words, through the application of computational approaches to analyze intricate gene-microRNA interactions, scientists can obtain a deeper understanding of the molecular mechanisms that underlie these activities, and this information may result in better HBV-HCC patient diagnosis, prognosis, and treatment regimens. 87
About 1 ml technique called eXtreme gradient boosting (XGB) can identify significant genes that could lead to HCC.
88
In accordance with previous studies, model performance was assessed using balanced measures for accuracy, sensitivity, selectivity, positive-predictive value, and negative-predictive value. For the XGBoost model, the accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and
Moreover, SVM-HCC identified a 23-miRNA signature that was associated with cancer stages in patients with HCC and achieved a 10-fold cross-validation accuracy, sensitivity, specificity, Matthews correlation coefficient, and area under the receiver operating characteristic curve (AUC) of 92.59%, .98, .74, .80, and .86, respectively, 89 and an accuracy and test AUC of 74.28 and 73%, respectively. A hepatitis B pipeline involving of hsa-let-7i and its 13 integrated miRNAs was identified through a correlation evaluation of the miRNA profile and its jointly expressed miRNAs. 90
Additionally, possible differentially expressed miRNAs from the Cancer Genome Atlas database were identified via ML model. 91 The 5 of these miRNAs include NK3 homeobox 1 (NKX3-1), forkhead box O1 (FOXO1), nuclear receptor subfamily 4 groups A member 3 (NR4A3), secreted frizzled-related protein 1 (SFRP1), interleukin 6 signal transducer (IL6ST), and NK3 homeobox 1 (NKX3-1). 92 In the context of HBV-associated HCC, the genes NKX3-1, FOXO1, NR4A3, SFRP1, and IL6ST are important regulators of cellular processes including proliferation, differentiation, apoptosis, inflammation, and tumorigenesis. NKX3-1 is involved in cell proliferation, and its downregulation is linked to tumor progression. 93 In HCC, FOXO1 controls apoptosis and prevents tumor growth. NR4A3, which regulates cellular proliferation, is downregulated in HCC. 94
As an antagonist of the Wnt signaling pathway, SFRP1 downregulation is associated with dysregulated Wnt signaling during the development of HCC. 95 The interleukin-6 signaling system, in which IL6ST is involved, can stimulate cancer and inflammation in addition to, when dysregulated, liver regeneration. 96
ML Approaches Revealed Long Noncoding microRNAs (lncRNAs) and Their Related Genes as Genomic Signatures for HBV-Associated Hepatocyte Remodeling and HCC
The identification of lncRNAs from large genomic datasets as genetic markers for hepatitis B virus-associated hepatocyte remodeling and HCC can be accomplished via a variety of ML techniques, including SVM, random forests, and deep learning neural networks.97,98 LncRNAs can be grouped according to their expression patterns by clustering algorithms, including k-means clustering, hierarchical clustering, and principal component analysis (PCA). 99 These algorithms can also be used to find any correlations between lncRNAs, HBV-associated hepatocyte remodeling, and HCC. 100
The most significant lncRNAs that are predictive of HBV-associated hepatocyte remodeling and HCC can be identified via feature selection algorithms. 101 Additionally, network-based techniques like gene co-expression networks, protein-protein interaction networks, or regulatory networks can be used to identify putative relationships between lncRNAs and other genes or proteins implicated in HCC and hepatocyte remodeling linked to the HBV. 102
These ML algorithms may discriminate between known cases of HCC and healthy controls by training them on the expression levels of particular lncRNAs and other biological cues. 103 After being trained, these models can be used to predict, on the basis of lncRNA profiles, whether newly diagnosed HBV patients are more likely to develop HCC. This method shows promise for enhancing HCC linked to early identification of HBV and tailored therapy. 48
In a previous study, lncRNAs such as AL356056.2, AL445524.1, TRIM52-AS1, AC093642.1, EHMT2-AS1, AC003991.1, AC008040.1, LINC00844, and LINC01018 were filtered out as possible diagnostic lncRNA biological signals for HCC associated with HBV via ML. 104 Since particular indicators or biological pathways relevant to HBV-HCC development for these genes are still being intensively explored, the functions of these genes in HBV-HCC still need to be precisely described in the literature. 17 In contrast, given their known characteristics and connections with other genes or proteins involved in cellular processes relevant to HCC and HBV, probable roles can be deduced from them. Since some long noncoding RNAs (lncRNAs) have been linked to chromatin remodeling and epigenetic control, lncRNAs like EHMT2-AS1 may be engaged in mechanisms that are comparable to those that lead to the development of HCC.105,106
TRIM52-AS1 may function as an antisense RNA for the TRIM52 gene, which is connected to immunological control and the antiviral response. As a result, it may affect how the host reacts to HBV infection.106,107 Each of these factors might have altered the gene expression linked to chronic inflammation inside the immune milieu of tumors. New information about the specific functions of lncRNAs in HCC associated with HBV was obtained through a functional annotation of the target DEmRNAs. 108
The specificity and sensitivity of the random forest and support vector machine models were 94.3% and 86.5% and 95.7% and 90.4%, respectively, as indicated by their respective areas under the curve (AUCs). The integrated coexpressed DEmRNAs matched similar pathways in the retinol metabolism, PI3K-Akt signaling cycle, biological cancer development, and p53 signaling processes, according to the findings of the functional enrichment investigation. 109
Non-coding RNA molecules known as “snoRNAs” are mostly involved in the processing and alteration of other RNA molecules. They play a role in several biological functions, including splicing, gene control, and ribosomal RNA processing. 110 The specific functions of different snoRNA species might, however, differ greatly from one another.
The activities of TEP SNORD12B, SNORD14E, and SNORA63 in the context of HBV-HCC are not well understood. 111 These snoRNAs may play a role in controlling the production of genes or altering particular target RNAs related to HBV infection or the growth of HCC. TEP from HBV-associated liver cancer up-regulated SNORA63 and down-regulated SNORD12B and SNORD14E, which may serve as diagnostic biomarkers for both the early stages of the disease and HBV-induced HCC. 112
Additionally, when paired with platelet characteristics, TEP SNORD12B, SNORD14E, and SNORA63 improve the testing efficacy of AFP and obtain good diagnostic effectiveness for HBV-induced HCC. SNORA63, SNORD14E, and SNORD12B, which are abnormally expressed TEPs may be new, non-invasive indicators for identifying HBV-related HCC. 113
One of the snoRNAs that is most obviously elevated in HBV-caused HCC was SNORD12B, and there is a strong correlation between elevated SNORD12B expression and unfavorable patient prognosis. 112 Significantly, SNORD12B aids in the development, circulation, assault, and dissemination of tumor hepatocytes both in vivo and in vitro during chronic hepatitis B virus infection, demonstrating its oncogenic character. 113 SNORA18L5 enhances the likelihood of HBV-related hepatocellular carcinoma by changing where ribosomal proteins are located and lowering p53 expression. Furthermore, through the miR-30a-5p/REEP3 pathway, CircFAT1 accelerates the expansion of cancer in hepatocytes. Circulating miRNA-483-5p, 21, and 155 may be novel prognostic and early diagnostic biomarkers for HCC, according to machine learning algorithms. 114
Among patients with reoccurring HCC, the genes CETN2, HMGA1, RACGAP1, and SNRPB were elevated, and they can serve as prognostic biomarkers to predict the recurrence of HCC following surgical removal. 115 HBV-HCC has also been linked to 3 genes, such as TOP3B, SSBP3, and COX7A2L. Previous studies emphasized the significance of biomarkers including COX7A2L, SSBP3, and TOP3B, which were continuously significant in a variety of models, indicating their potential to enhance the prediction accuracy for HCC prognosis.52,116
A component of cytochrome c oxidase, which is vital to both oxidative phosphorylation and the electron transport chain, is COX7A2L. 117 It is essential to the creation of cellular energy because it makes it easier for electrons to move from cytochrome c to molecular oxygen, which in turn drives the synthesis of ATP. Changes in COX7A2L expression or function may affect the pathways involved in cellular metabolism and energy production in HBV-HCC. 118
Comparably, SSBP3 genes encode a single-stranded DNA-binding protein that is involved in several metabolic functions related to DNA, including transcriptional control, repair, recombination, and replication. Its precise function in HBV-HCC is unknown; however, it might be related to controlling the processing or stability of viral or host genetic material. 119
Moreover, TOP3B encodes the beta enzyme DNA topoisomerase III, which catalyzes modifications to DNA molecules’ topological states during cellular functions like replication, recombination, and repair. 120 Changes in TOP3B activity or expression in HBV-HCC may influence genomic stability by affecting the resolution or processing of complex DNA structures. 121
On the other hand, applying a multivariate Cox regression model, the predictive biomarkers YWHAB, PPAT, and NOL10 were ultimately found. YWHAB is involved in several biological functions, including signal transduction, cell cycle regulation, and apoptosis. 122 Through its interactions with viral proteins or cellular components linked to HBV replication and the formation of HCC, YWHAB facilitates the growth and progression of tumors in the context of HBV-HCC. 123 Moreover, YWHAB controls the activity of important carcinogenic pathways like the PI3K/AKT/mTOR pathway, which is commonly dysregulated in HCC. 124
An enzyme called phosphoribosyl pyrophosphate amidotransferase (PPAT) is involved in the metabolism of nucleotides and purines. In HBV-HCC, PPAT plays a role in the elevated need for nucleotides necessary for the fast growth of tumor cells brought on by HBV infection. 125 Infected hepatocytes with overexpressed PPAT have increased de novo purine production, which supports these needs and encourages tumor growth. Additionally, through its possible role in altering signaling pathways associated with cell survival or proliferation, PPAT may have non-canonical effects on the course of cancer. 126
Nucleolar Protein 10 (NOL10) is mostly found in the nucleolus, where it plays a role in ribosome synthesis, which is essential for regulating cellular development and responding to stress brought on by long-term infections such as HBV. 127 NOL10 influences ribosome biogenesis, which affects translation control within cells and results in the uncontrolled growth characteristic of cancer cells observed at advanced stages following long-term chronic hepatitis B infection. This may be a factor contributing to malignant transformation in HBV-HCC pathogenesis. 128
The control over the methylation of the YWHAB, PPAT, and NOL10 promoters differed in malignant hepatocytes and cancers. The examination of the immune invasion indicated that B cells and dendritic cells were probably responsible for the aberrant modulation of the aforementioned signals. 129 These indicators’ important role in the growth of HCC was demonstrated by their gene set variation analysis (GSVA). Three hub genes and our findings imply that the circTMCO3/miR-577/RHOA, circHMGCS1/miR-892a/KIF5B, and circHMGCS1/miR-581/AURKA pathways may be essential for the development of HCC. Both in vitro and in vivo carcinogenesis were significantly reduced when the expression of these genes was specifically inhibited. 130
A transcriptomics study revealed that GABRR1, SOX11, COL24A1, and MYLK2 were expressed in the Cancer Genome Atlas (TCGA) dataset. These genes were markedly up-regulated in cancer cells. In particular, Kaplan-Meier analysis revealed an unfavorable relationship between the mRNA levels of these genes and the overall survival (OS) and progression-free survival (PFS) of HCC patients.131 -134 These genes can predict the longevity of HCC patients. Additional research revealed the significance of the GABRP, HBG1, and DAK (TKFC) genes in HCC.135,136 BIRC5, CDC20, and UBE2C demonstrated substantial associations with the OS. As a result, 3 hub genes, BIRC5, CDC20, and UBE2C, were discovered and confirmed to be associated with the course and outcome of HCC.137,138 These factors might be possible HCC treatment targets. According to a survival study, increased GLMP, SLC38A6, and WDR76 mRNA expression levels were linked to poor prognosis, and the combined expression of these 3 genes was found to be a standalone predictor of outcome in patients with HBV-HCC. 139
Five cancer suppressors that were confirmed to be hsa-miR-183-5p target genes (AKAP12, DYRK2, FOXN3, FOXO1, and LATS2) had their expression restored by LNC-HC. 140 All things considered, human LNC-HC was found to be a new tumor suppressor that, through competitively binding hsa-miR-183-5p as a competing endogenous RNA (ceRNA), could decrease HCC cell expansion in vitro and limit tumor development in vivo. These results imply that LNC-HC may function as an indicator for HCC and offer an intriguing therapeutic target for the disease’s management. 85
Several clinical variables in the emergence of HCC may be associated with higher levels of CASC7 and the levels of systemic CASC7 and FOXA1 are anticipated to grow in significance as a diagnostic tool for HCC and a way to track the course of the illness. 141 Especially in male individuals with hepatocellular carcinoma, FOXA1 suppresses PIK3R1 activation to slow the progression of the cancer. A tumor suppressor that is inhibited in HCC, miR-212-3p, may be able to block the expression of FOXA1. FOXA1 controlled the expression of AGR2, which is how it carried out its biological role. Low levels of miR-212-3p might be attributed to FOXA1′s aberrant overexpression. 142
Furthermore, the production of CASC7 had an adverse association with the generation of miR-30a-5p in HCC tumor hepatocytes. Additionally, CASC7 and miR-30a-5p modulated KLF10, a target of miR-30a-5p. By coupling to miR-30a-5p, CASC7 controlled the KLF10/TGF-β/SMAD3 route, which enhanced the growth of HCC cells. 142
To identify diagnostic biomarkers for HBV-related hepatocyte remodeling, other algorithms such as LASSO regression, SVM-RFE, RF, and WGCNA are employed. 108 The genes encoding versican (VCAN) and phosphatidic acid phosphatase type 2C (PPAP2C) were shown to be potentially useful diagnostic indicators for HBV-hepatocyte remodeling based on these investigations, along with their performance characteristic curves. 27 Moreover, SPINK1 was also shown to be a standalone predictor of overall survival for patients with HCC by multivariate analysis. 143
By utilizing ML, it is now possible to examine novel forms of clinical data to identify HCC. Genes including FCN3, CLEC1B, and PRC1 were linked with overall survival, progression-free survival, and disease-free survival, according to the large-scale transcriptome profiling data analysis. 144 Using ML techniques, important therapeutic targets associated with HCC in its early and late stages were also discovered, and vitronectin, lactate dehydrogenase D (LDHD), thrombin-activable fibrinolysis inhibitor, and miR-590 were among the key players in the early stage of HBV-HCC, while regucalcin, the SPRY domain containing 4, miR-3199-1, miR-194-2, and miR-4999 were involved in the late stage of HBV-HCC. 143
Unique transcriptome signatures can be discovered with the use of ML techniques in conjunction with RNA sequencing data. When it came to distinguishing between the HCC and normal cell models, the 3 signals that were found, PARP2-202, SPON2-203, and CYREN-211, showed the greatest accuracy compared to all additional transcriptomes. 145 The results of this investigation’s approach can be applied to any RNA sequencing dataset to identify new transcriptional indicators. 146
PARP2-202 may contribute to the survival and proliferation of HCC cells by repairing DNA and maintaining genomic stability. It also interacts with biological components linked to HBV replication or viral proteins. 147 Conversely, it has been proposed that SPON2-203 contributes to tumor growth and dissemination and that elevated levels of this protein are associated with a worse prognosis. 148 It has been suggested that CYREN-211 may have a role in the development of tumors by modifying the course of the cell cycle through interactions with important regulatory proteins that govern the G1/S transition. 149
Research has shown that human cartilage contains a cellular matrix glycoprotein called cartilage oligomeric matrix protein (COMP), which controls cell phenotype throughout tissue development and transformation. 150 It has been observed that COMP does not circulate in normal hepatic tissues and that its amount is markedly elevated in HCC cells. COMP worked with CD36 and, afterward, was critical for the advancement of MEK/ERK and PI3K/AKT-mediated HCC. Increased COMP is a useful indicator of the likelihood of developing HCC as well as a diagnostic marker for the disease. 151
Hepatic stellate cells generate a glycoprotein known as Mac-2 attaching protein (M2BP), which is extensively glycosylated to form the Mac-2-bound protein glycosylation isomer (M2BPGi). As a mediator linking HSCs and Kupffer cells, M2BPGi may collaborate with extracellular matrix protein molecules such as galectin-3 (Mac-2) and has been demonstrated to help determine the extent of cirrhosis in long-term liver disease. According to recent research, M2BPGi may be able to predict the progression of HCC. 152
Furthermore, according to the prediction model, combining the data from serum Gal-3BP and alpha-fetoprotein (AFP) increased the accuracy, specificity, and sensitivity of the initial HCC diagnosis from 90.63% to 95%, 93.75% to 95%, and 87.5% to 95%, respectively. According to these findings, serum Gal-3BP level is a possible indicator for identifying HBV-induced HCC, and when serum Gal-3BP and AFP are used together, as opposed to AFP alone, in current clinical practice, the ability to diagnose HBV-HCC is improved.86,153
In HBV-HCC and hepatocyte remodeling, sex-specific gene biomarkers can also be found using machine learning techniques. It is well recognized that there are gender variations in HBV- caused liver cancer. Researchers can learn more about the underlying molecular pathways that underlie the onset, development, and progression of HBV-HCC as well as hepatocyte remodeling by finding sex-specific genetic signatures. 153
A recent study on sex-specific transcriptome analysis reveals gender-related dysregulated pathways in human HCC, highlighting the usefulness of machine learning in the identification of sex-specific gene biomarkers. To find gender-related dysregulated pathways in HCC patients, the authors of this work employed a cross-validation strategy in conjunction with a machine-learning technique known as Random Forests. 86
In addition, examining the sex-related variations in the expression and operation of predicted genes implicated in hepatocyte remodeling and HCC linked to HBV is a fascinating area of study. 154 It is becoming better acknowledged that sex has a biological role in liver illnesses, such as HCC and HBV infection. In several liver illnesses, sex-specific variations in immune responses, gene expression, and disease progression have been noted. 155
Compared to women, men are more likely to be diagnosed with and experience aggressive HCC. A growing body of research indicates that the protective effects of estrogen and the stimulatory effects of androgens in the development and progression of HCC appear to mediate the sex discrepancy. 156 The effect of sex hormones on the transactivation of the hepatitis B virus X protein and the release of inflammatory cytokines has been the primary focus of studies on the sex difference in HCC over the past few decades, and these studies have recently become even more intense. 139 Sex hormones connect to certain cellular receptors in hepatocytes and alter the related signaling pathways, which in turn affect genetic modifications and DNA damage repair. 157
Certain genes may be expressed or regulated differently in males and females in the setting of HBV-associated hepatocyte remodeling and HCC. Comprehending these variations can offer valuable perspectives on the molecular processes that underlie gender differences in illness susceptibility, progression, and response to treatment. 158
Examining extensive omics data (transcriptomics or proteomics) via an ML approach from patients with HBV-related liver disorders, both male and female, may identify putative sex-specific gene signatures linked to carcinogenesis and hepatocyte remodeling. 159 Further understanding of the underlying mechanisms causing sex inequalities in these illnesses may come from examining the effects of sex hormones on host immunological responses, HBV replication, and tumor formation. 160
Using a sex-specific differential expression analysis of the tumor and tissues close to the tumor, males showed PI3K, PI3K/AKT, FGFR, EGFR, NGF, GF1R, Rap1, DAP12, and IL-2 signaling pathway enrichment, while both sexes showed activation of pathways related to apoptosis and cell cycle identified genes and pathways that are relevant to the etiology and that differentiate male from female HCC.161 -163
Moreover, the incidence of HCC is greater in men than in women. According to a recent study published in the gut, the development of HCC is significantly inhibited by CYP39A1, an autosomal gene specific to the liver that expressed preferentially in females. 164 According to the expression quantitative trait loci (eQTL) analysis, males, and females differ in the activation of several signaling pathways. 140 Sex has a significant role in modulating the effects of eQTLs in HCC, as seen by the 24.3% of identified eQTLs that show differential effects between the sexes. The genes exhibiting sex-specific dysregulation in tumors and those carrying a sex-specific eQTL converge in clinically significant pathways, implying that differential genetic effects on gene expression play a role in the molecular etiologies of male and female HCC.165,166
Sex-specific molecular etiologies of HCC are identified via a sex-stratified approach. These findings provide a fresh understanding of how inherited genetic transcription control influences sex differences in the etiology of HCC and establish a foundation for further research on sex-biased malignancies. 161 In general, investigating the sex-specific features of gene expression patterns in HCC and HBV-associated hepatocyte remodeling may provide significant biological insights with ramifications for patient-specific personalized medicine strategies for both male and female patients.161,167
In the ML and bioinformatics-based search for biomarkers for HBV-HCC, multi-center longitudinal studies are essential for several reasons. First, longitudinal studies enable the monitoring of biomarker performance over time, which is crucial for understanding biomarker predictive capabilities at various stages of disease progression. 168 This approach enhances the reliability of results by establishing temporal relationships between biomarker levels and the development of HCC. 169 Furthermore, these investigations are essential for validating new biomarkers identified by ML algorithms. Although many biomarkers have shown promise in early research, they have not been sufficiently validated across diverse cohorts. 29 For instance, the Early Detection Research Network (EDRN) of the National Cancer Institute emphasizes the importance of extensive biorepositories and longitudinal data collection to facilitate the discovery and validation of novel biomarkers in high-risk populations 170 (Table 1).
Role of major gene signatures in HBV-induced hepatocyte remodeling and HCC.
Abbreviations: HBV: hepatitis B virus; HCC: hepatocellular carcinoma.
Table 1 summarizes the major genomic signatures with their respective functions, which were identified by the machine learning approach.
Bioinformatics Analysis Revealed Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and HCC
By combining several omics datasets (genomics, transcriptomics, proteomics, and metabolomics), using a variety of statistical techniques, and utilizing ML algorithms, bioinformatics analysis facilitates the identification of important genetic signatures for HBV-HCC and hepatocyte remodeling. 178 Using a comprehensive approach, it is possible to find potential biomarkers for diagnostic or treatment targets in addition to learning more about the molecular pathways behind the development and course of the disease 113 (Figure 2).

The analysis of large-scale multi-omics datasets via a bioinformatics approach to discover key genomic signatures for HBV-HCC and hepatocyte remodeling.
Bioinformatics tools are crucial for identifying genomic biomarkers associated with HBV-HCC by leveraging extensive data from genomic databases such as the National Center for Biotechnology Information (NCBI), GEO, and TCGA. 52 These tools facilitate the identification of DEGs through differential RNA-Seq analysis using libraries like the limma and edgeR packages in R. 183
Open-source software and repositories, including cBioPortal, ArrayExpress, and GEO, enable comprehensive cancer omics data analysis. Additionally, web servers such as gene expression profiling interactive analysis (GEPIA), tumor-infiltrating immune cell (TIMER), and Oncomine support the analysis of associations between clinical factors and gene expression in tumors, aiding in the identification of prognostic biomarkers. 184 Furthermore, pathway analysis, network analysis, and multi-omics data integration enhance biomarker discovery and patient prognosis prediction. Algorithms for sequence alignment and variant calling are essential for identifying genetic variations and mutations while examining gene functionalities. 185
Furthermore, numerous bioinformatic investigations have led to the identification of prospective miRNAs with varying expression patterns and target genes. 180 A possible miRNA-mRNA regulatory loop including miR-93-5p-JUN/STAT3 (Jun proto-oncogene, AP-1 transcription factor subunit/signal transducer and activator of transcription 3) pathway, miR-106b-5p-STAT3 pathway, miR-21-5p-STAT3/PIK3R1 (phosphoinositide-3-kinase regulatory subunit 1) pathway, miR-125b-5p-E2F2/E2F3 pathway, and miR-let7c-5p-NRAS (neuroblastoma ras viral oncogene homolog) pathway has been developed by matching miRNA-mRNA pairs in the miRNet dataset.112,186
Furthermore, elevated expression of the miRNAs miR-130b-5p, miR-320d, miR-483-3p, miR-1246, miR-320b, miR-192-5p, miR-4532, miR-320c, miR-483-5p, and miR-122-5p may affect expression across various HCC pathways, according to KEGG bioinformatic analysis (mirPATHv4), which distinguishes verified targets and pathways in HCC. Furthermore, the overexpression of miR-483-5p regulates the expression of PPARα/TIMP2 and CDK15 (peroxisome proliferator-activated receptor alpha/tissue inhibitor of metalloproteinases 2 and cyclin-dependent kinases) target genes. 187
During HBV-HCC and hepatocyte remodeling, the miRNA-mRNA regulatory network is performed by RNA polymerase enzymes and controlled by the chromatin environment, nucleosome occupancy, histone modifications, transcription factor availability, and regulatory elements. 113 Moreover, researchers can find genomic markers for HBV-HCC and hepatocyte remodeling by using bioinformatics analysis of the miRNA-mRNA regulatory network, which also sheds light on the underlying molecular mechanisms behind the progression of the disease. The ramifications of these results extend beyond diagnosis, prognosis, and possible treatment therapies aimed at certain dysregulated pathways. 52 During miRNA-miRNA interaction, Bowtie, Burrows-Wheeler Aligner (BWA), SAMtools, and Genome Analysis Toolkit (GATK) are frequently utilized algorithms for bioinformatics analysis. 188 BWA is a tool for mapping low-divergent sequences to a large reference genome, while Bowtie is a short-read aligner used to align sequencing reads to a reference genome. 189 In addition, the software package GA was created at the Broad Institute to examine next-generation sequencing data, and SAMtools is a suite of programs for working with high-throughput sequencing data 190 (Figure 3).

The application of bioinformatics analysis to determine the potential miRNA signals for HBV-HCC and hepatocyte remodeling from miRNA-miRNA interaction networks.
In other words, the bioinformatics study aligned with the expression verification of the 5 miRNAs. In particular, the prognostic value of hsa miR-10b-5p and hsa miR-10b-3p in patients with HCC is high. Therefore, the 5 differentially expressed miRNAs could be used as medical indicators for patients with HBV-HCC. Furthermore, the carcinogenesis of HCC may be influenced by the varying degrees of expression of the targets of each of these 5 mRNAs, which include SFRP1, EDNRB, NR4A3, FHL2, NKX3 1, IL6ST, and FOXO 1. 191
Furthermore, bioinformatic analysis-based simultaneous evaluation of erratically methylated DEGs may offer novel insights into the epigenetic process underlying HCC. Hub genes such as S100A9 (S100 calcium-binding protein A9), CHEK1 (checkpoint kinase 1), KIF11 (kinesin family member 11), PBK (PDZ binding kinase), and MCM3 (minichromosome maintenance complex component 3) may function as sensors to enable an accurate diagnosis of HCC. 192
Moreover, according to earlier research, the ubiquitin-conjugating enzyme E2C (UBE2C) is implicated in HBV-HCC as a carcinogen. 144
A thorough bioinformatic analysis was initially used to examine UBE2C activation in HCC. The diagnostic and predictive functions of UBE2C in the hepatocarcinogenesis of HBV-caused HCC, as well as the UBE2C promoter methylation rate and upstream regulatory miRNAs of UBE2C in HCC, were evaluated via receiver operating characteristic (ROC) curve modeling and survival analysis. 193
UBE2C was substantially more abundant in patients HCC than in normal controls. In HCC, there was also a clear decrease in UBE2C promoter methylation, which was inversely linked with UBE2C mRNA expression. The combination of an in vitro experiment and a bioinformatic correlation analysis suggested that hsa-miR-193b-3p could be a significant additional upstream regulatory pathway of UBE2C in HCC. These findings suggests that UBE2C is elevated in HCC and could be a crucial biomarker for prognosis and diagnosis in individuals with HCC. 194
A previous study also screened for miRNA gene variants that indirectly target mRNA transcripts via the regulation of hepatitis B infection by characterizing the functional effects of single nucleotide polymorphisms (SNPs) on mRNA-miRNA and miRNA-binding region interactions and most SNPs were found in Has-miR-139-3p. 195
Specific changes were observed in miRNAs associated with HBV-related HCC, including miR-150, miR-342-3p, miR-663, miR-20b, miR-92a-3p, miR-376c-3p, and miR-92b. On the basis of GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis, these HBV-related HCC miRNAs are predicted to regulate transcription, the promoter region of RNA polymerase II, focal binding, activation of proteins via the MAPK (mitogen-activated protein kinase) signaling pathway, and the actin superstructure. Additionally, according to IPA (ingenuity pathway analysis), these miRNAs have a substantial effect on HCC incidence and HBV infection through their actions on AGO2 (argonaute RISC catalytic component 2), TP53 (tumor protein53 gene), and CCND1 (cyclin D1) genes. 196
In other words, miRNA-mRNA axes were found in HCC caused by HBV. For HBV-related HCC, hsa-let-7c-3p/CKS2 (cyclin-dependent kinase regulatory subunit 2), hsa-mir-195-5p/CDK1, and hsa-mir-5589-3p/CCNB1 may be used as predictive markers and treatment strategies. 197 Previous investigation has shown that let-7a, miR-34a, and miR-199 a/b function as cancer silencers and have unique interactions in several immunological pathways. Hence, they serve as significant HBV-associated signatures. 174
Furthermore, in HBV-induced HCC, miRNA-99a-5p is erratically generated and may alter the metabolic processes of hepatocellular carcinoma cells. Compared with healthy tissue, cancerous tissue expresses this gene at a substantially reduced level. According to the KEGG and GO analysis, miRNA-99a-5p operates through several pathways. ROC analysis also revealed that it has much promise for prognostic prediction in HCC. 198
The findings of the upregulation and knockdown experiments verified that miRNA-99a-5p could prevent malignant hepatocytes from proliferating, suggesting that it might play a significant role as a tumor suppressor in HCC. 199 These findings suggest that miRNA-99a-5p is adversely associated with the advancement of HCC in HBV-infected individuals and may serve as an innovative treatment option for HCC. 200
According to the clusterProfiler R package, KEGG, and GO enrichment studies of DEGs, the aberrant expression of DEGs, including CDK1 (cyclin-dependent kinase 1), CDC20 (cell division cycle protein 20), CCNB1 (cyclin B1), CENPF (centromere protein F), and MAD2L1 (mitotic arrest deficient 2 like 1) is instructive for early detection, tumor phase classification, and prediction of the poor prognosis of HBV-caused hepatic malignancy. 201
The results of the GO enrichment study demonstrated that these DEGs had substantial increases in molecular functions, such as binding to calcium ions, protein kinases, DNA, and heme, and in physiological processes such as division, growth, redox processes, immune response, and proteolysis. These DEGs were shown to be considerably enriched in the framework of KEGG and GO investigations for the development of cells, oocyte meiosis, metabolic routes, antibiotic biosynthesis, and the p53 pathway. 202
On the other hand, the development and progression of HBV-related HCC may also be significantly influenced by CDK1, KIF11 (kinesin family member 11), KIF20A (kinesin family member 20A), CCNB1, NDC80 (NDC80 kinetochore complex component), MCM2 (minichromosome maintenance complex component 2), RFC4 (replication factor C subunit 4), and RRM2 (ribonucleotide reductase regulatory subunit M2). 203
KIF11 and KIF20A are involved in cell division and mitosis, and CDK1 is involved in the regulation of the cell cycle. Furthermore, CCNB1 is a crucial cell cycle regulator. Furthermore, chromosomal segregation during mitosis is linked to NDC80.204,205 Moreover, MCM2 plays a role in the start of DNA replication. Furthermore, RRM2 is involved in nucleotide synthesis, whereas RFC4 is involved in DNA replication and repair activities. All of these proteins work together to cause cellular processes to become dysregulated, which may result in uncontrolled development traits that are subsequently observed on in the course of a long-term, chronic hepatitis B infection. Thus, these proteins play a significant role in the initiation and advancement of HBV-associated HCC. 206 Furthermore, they may also serve as possible targets for therapy and screening biomarkers for HBV-associated HCC, according to PPI (protein-protein interaction) network modeling. 207
In addition, on the basis of integrated bioinformatics and survival analysis, the poor overall survival and pathogenesis of HCC patients were linked to elevated gene expression of CDC20, MAD2L1, BUB1 (mitotic checkpoint serine/threonine kinase), BUB1B, AURKA (aurora kinase A), TOP2A (topoisomerase II alpha), CCNB2, TPX2 (TPX2 microtubule nucleation factor), CDC7, CDC20, and MCM3 (mini-chromosome maintenance3).208 -210
CDC20, MAD2L1, BUB1, BUB1B, AURKA, TOP2A, CCNB2, TPX2, CDC7, and MCM3 are all proteins with significant biological roles in HBV-HCC. Cell cycle progression is regulated in part by CDC20. 211 During cell division, MAD2L1 facilitates appropriate chromosomal segregation by participating in the spindle assembly checkpoint. 212 Additionally, BUB1 and BUB1B are crucial for appropriate chromosomal segregation during mitosis. 213
AURKA controls centrosome the development and the production of mitotic spindles. In both chromosomal condensation and DNA replication, TOP2A is essential. Another important cell cycle regulator that manages the start of mitosis is CCNB2.214,215 TPX2 plays a role in spindle assembly by nucleating microtubules. 216 CDC7 plays a crucial role in the initiation of DNA replication. 217 Moreover, MCM3 plays a crucial role in the pre-replicative complex, which controls the replication of DNA. 218
Collectively, these proteins are involved in dysregulated cellular processes that could result in the uncontrollable growth characteristics observed at advanced stages after a protracted, chronic infection with hepatitis B. Therefore, these proteins play a significant role in the onset and advancement of HBV-associated HCC. 211
A high Tripartite Motif-Containing 28 (TRIM28) expression level was linked to T categorization, clinical stage, histopathological grade, and circulating AFP levels, according to genomics and empirical analysis, and it was found to be a distinct risk indicator for an unfavorable outcome in patients with HBV-caused HCC. 219 TRIM28 expression closely linked to the mechanism of ligand-receptor interaction, and its upregulation may inhibit DC activation. Therefore, in HBV-associated hepatic malignancies, increased expression of TRIM28 may be a possible prognostic biomarker associated with immunological infiltration. 220 However, the expression of CYP2C8 (cytochrome P450 family 2 subfamily C member 8) is downregulated in HCC and malignant hepatocytes. 221
Identifying biomarkers for HBV-HCC requires the integration of computational findings with experimental validation. Bioinformatics analyses enable the processing of large datasets and the identification of significant patterns related to gene expression and disease progression. 222 For example, researchers have utilized bioinformatics tools to develop prognostic models based on gene signatures, which are subsequently validated through experimental techniques such as Western blotting and quantitative PCR to ensure their clinical applicability. This combined strategy not only enhances the reliability of biomarker candidates but also improves their utility in predicting patient outcomes and guiding treatment plans. Ultimately, the combination of computational insights and laboratory validation provides a stronger foundation for biomarker development in HCC, paving the way for improved diagnostic and therapeutic interventions. 223
Data Standardization
In the realm of bioinformatics and ML techniques, data standardization is vital, particularly when analyzing genetic markers associated with HCC and HBV-induced hepatocyte remodeling. 133 Variability in data processing and experimental protocols can lead to discrepancies that complicate the replication and comparison of findings across different studies. By standardizing these procedures, significant genetic signatures can be identified, ensuring that data collected from various sources can be effectively integrated and analyzed. 34 This consistency not only fosters collaborative research efforts but also enhances the reliability of results, allowing scientists to confidently build upon one another’s work. Ultimately, addressing variability through data standardization is essential for improving clinical outcomes and deepening our understanding of hepatitis B virus pathophysiology. 224
Clinical Validation of Discovered Biomarkers for Patients with HBV-HCC
For predictive models to be both accurate and useful in actual clinical situations, clinical validation is essential in ML. 225 To establish their efficacy and reliability in HCC surveillance among patients with chronic HBV infection, further validation through extensive clinical trials is necessary. This stage is crucial for enhancing early detection and ultimately improving patient outcomes. 226
To ensure their dependability and effectiveness in a range of patient populations, however, the transfer from theoretical models to clinical practice requires thorough validation against external cohorts. 227 For instance, a 2-stage ML model has demonstrated strong performance across multi-center datasets, providing recommendations for initial treatments and forecasting post-treatment survival for HCC patients. Additionally, models utilizing readily accessible clinical characteristics, such as tumor size and α-fetoprotein levels, have proven effective in accurately predicting the likelihood of microvascular invasion before surgery, which is crucial for tailoring individualized treatment plans for patients. 228
Both internal and external datasets are typically utilized in the validation process to ensure the generalizability and robustness of the models. For example, an ensemble ML model was employed to predict early mortality among HCC patients with bone metastases, yielding promising results that could enhance clinical decision-making. 229 Additionally, a variety of ML techniques have been applied to analyze genetic data and imaging features, improving the diagnostic accuracy in distinguishing between benign and malignant liver lesions. 103 These advancements underscore the potential of ML to enhance clinical management strategies for HBV-associated HCC, paving the way for precision medicine approaches based on validated genomic biomarkers. 230
Strategies for Avoiding Overfitting in ML Models
One major issue is the risk of overfitting in ML models, particularly when searching for biomarkers associated with HBV-HCC. Overfitting occurs when a model learns the noise and fluctuations in the training data rather than the underlying patterns, resulting in poor generalization of unseen data. 231 Overly complex models may perform well during training but poorly during the validation or testing phases of research focused on gene expression data for potential biomarkers. 180 The implementation of strategies such as cross-validation and regularization is essential to mitigate overfitting, especially when dealing with small or non-diverse datasets. By carefully balancing model complexity and performance, we can increase the accuracy of biomarker discovery for HBV-HCC, ultimately improving clinical outcomes. 29
For reliable outcomes, strategies to prevent overfitting in ML models are essential during the genomic biomarker discovery process for HBV-HCC. Feature selection is an effective strategy that reduces noise and enhances model performance by lowering the dimensionality of the data and retaining only the most relevant features. 232 This goal can be achieved through techniques such as regularization and recursive feature elimination. Additionally, cross-validation is crucial for assessing the robustness of the model; by repeatedly splitting the data into training and validation sets, we can ensure that the model performs well when applied to new data. A diverse and sufficiently large dataset can help reduce overfitting and enhance the reliability of biomarker discovery efforts by providing a more representative sample of the underlying population. 146
Conclusions
Despite the growing amount of knowledge about how HBV-HCC develops, early identification and treatment of HBV-induced hepatocyte remodeling and HCC are still difficult. Emerging genomic signatures (indicators) unique to liver cancers that guarantee greater identification and longevity for patients still require evaluation. Our review summarized the combined use of machine learning algorithms and bioinformatics analysis to extract novel genomic signatures from genomic, transcriptomic, epigenomic, proteomic, metabolomics, and multi-omics datasets. These genomic signatures serve as diagnostic, therapeutic, and prognostic targets for HBV-induced hepatocyte remodeling and HCC.
