Dynamic Gene Attention Focus (DyGAF): Enhancing Biomarker Identification Through Dual-Model Attention Networks

Abstract

The DyGAF model, which stands for Dynamic Gene Attention Focus, is specifically designed and tailored to address the challenges in biomarker detection, progression reporting of pathogen infection, and disease diagnostics. The DyGAF model introduced a novel dual-model attention-based mechanism within neural networks, combined with machine learning algorithms to enhance the process of biomarker identification. The model transcended traditional diagnostic approaches by meticulously analyzing gene expression data. DyGAF not only identified but also ranked genes based on their significance, revealing a comprehensive list of the top genes essential for disease detection and prognosis. In addition, KEGG pathways, Wiki Pathways, and Gene Ontology–based analyses provided a multileveled evaluation of the genes’ roles. In our analyses, we tailored COVID-19 gene expression profile from nasopharyngeal swabs that offer a more nuanced view of the intricate interplay between the host and the virus. The genes ranked by the DyGAF model were compared against those selected by differential expression analysis and random forest feature selection methods for further validation of our model. DyGAF demonstrated its prowess in identifying important biomarkers that could enrich gene ontologies and pathways crucial for elucidating the pathogenesis of COVID-19. Furthermore, DyGAF was also employed for diagnosing COVID-19 patients by classifying gene-expression profiles with an accuracy of 94.23%. Benchmarking against other conventional models revealed DyGAF’s superior performance, highlighting its effectiveness in identifying and categorizing COVID-19 cases. In summary, DyGAF model represents a significant advancement in genomic research, providing a more comprehensive and precise tool for identifying key genetic markers and unraveling the complex biological insights of a disease. The DyGAF model is available as a software package at the following link: https://github.com/hiddenntreasure/DyGAF.

Keywords

COVID-19 machine learning deep learning RNA-seq pathogenesis biomarkers classification attention models;gene ontology pathways

Introduction

It is essential to prioritize the earliest and most accurate understanding of how disease pathogens develop and affect human cells at the genetic level. This involves studying how these pathogens, particularly respiratory viruses, infiltrate and replicate within host cells, using cellular mechanisms to proliferate and spread throughout the body. For example, the recent COVID-19 pandemic has highlighted how such viruses can alter gene expression patterns, disrupt normal cellular functions, and trigger diverse immune responses. The severity of the disease, ranging from mild to severe, is largely determined by the body’s immune reaction to the infection. Epithelial cells lining the airways serve as the primary entry points for respiratory viruses, enabling their replication and systemic dissemination.¹ Understanding the genetic basis of these interactions is indispensable for developing effective treatments and vaccines, as it provides insights into the mechanisms of viral entry, replication, and host-pathogen interactions.²

SARS-CoV-2 virus, responsible for COVID-19, induces a series of genetic and cellular pathway alterations on infecting the human body, profoundly influencing the disease’s trajectory and severity. Research has illuminated significant changes in gene expression related to cytokine production and the antiviral innate immune response. Specifically, a cytokine storm involves the excessive production of immune proteins due to altered gene expression,³ leading to severe inflammation and tissue damage in serious COVID-19 cases. Studies also identified critical pathways such as the JAK-STAT and NF-κB signaling pathways, which regulate immune responses and cytokine production.⁴ In addition, the antiviral innate immune response, the body’s initial defense against viral infections, is compromised as SARS-CoV-2 interferes with interferon signaling pathways critical for activating antiviral genes. Disruption of these pathways delays the initial immune response, allowing the virus to replicate more freely within the host.⁵

ACE2, TMPRSS2, IL-6, IFI6, and MX1 are identified as biomarker genes associated with COVID-19. Their impact on the disease cannot be attributed solely to single-gene mechanisms due to their involvement in complex molecular and genetic interactions. Numerous studies have used gene signature models to distinguish COVID-19 from healthy individuals or other viral diseases.^6
-9 These genes, along with others identified through genomic studies, highlight the complex interactions between the virus and the human host. Furthermore, the severity and spread of COVID-19 are influenced by environmental factors such as air quality, temperature, and population density.^10,11 This situation necessitates a comprehensive approach to understanding the biological characteristics of the virus, from its entry into the body and initiation of immune responses to the ensuing inflammation and health impacts.

In the field of bioinformatics, the integration of statistical methods has long been a conventional approach for analyzing complex biological data. Techniques such as differential expression analysis (DEA),¹² regression analysis,¹³ principal component analysis,¹⁴ and cluster analysis¹⁵ are used to uncover patterns and correlations within large genomic datasets. Similarly, the significance of machine learning in genomic studies, particularly in the context of COVID-19, lies in its ability to automate feature selection, accurately classify disease outcomes, and predict disease severity based on genetic markers. For example, Arslan¹⁶ applied machine learning to accurately identify COVID-19 infection and distinguish them from other coronaviruses, thereby improving diagnostics. Another study analyzed microarray data using the binary reptile search algorithm (BRSA) and found 6 COVID-19-related genes with a support vector machine (SVM) classifier with 87.22% accuracy.¹⁷ In 2020, Peterson et al⁹ used gene expression data to predict COVID-19 illness severity, developing an 18-gene signature that achieved an area under the curve (AUC) of 85%, indicating good performance in differentiating severe from mild/moderate conditions. In addition, an in silico study analyzed SARS-CoV-2 gene expression profiles to classify different stages of infections by implementing feature selection.¹⁸ Overall, by leveraging machine learning, researchers can transcend traditional analysis limitations, enabling the identification of biomarker genes and enhancing our understanding of COVID-19 and other complex diseases. Traditional methods, such as DEA, emphasize the significance of individual genes but often neglect the interactions within larger gene networks. In contrast, machine learning approaches tend to focus on these interdependent gene relationships. However, a significant gap remains in effectively addressing both aspects simultaneously.

Researchers classified COVID-19 patients either by using well-known biomarkers such as ACE2, TMPRSS2,^8,19 IL-6, IFI6,⁶ HERC6, IGF1R,²⁰ and MX1⁸ in a model known as 2/3-gene signature model or by analyzing the complete gene expression profiles of COVID-19 patients.^21,22 We hypothesize a new method that focuses on both aspects—investigating the molecular changes in a gene in response to infection while also considering the complex roles of these genes within gene networks and molecular interactions. In this study, we developed a method called Dynamic Gene Attention Focus (DyGAF) to address this gap. After evaluating our method with a comprehensive dataset, we demonstrated the high efficacy of DyGAF. It identified novel gene ontologies (GOs), pathways, and key genes compared with traditional methods such as DEA and random forest (RF). DyGAF’s innovative approach offers a robust tool for understanding the genetic basis of infectious diseases, providing valuable insights for future research and therapeutic development.

Materials and methods

Data collection

For our study, we collected gene expression profiling data from the Gene Expression Omnibus (GEO) database, hosted by the National Center for Biotechnology Information (NCBI), specifically under the accession number GSE188678.⁷ The workflow outlining the overall procedure, including sample collection, RT-PCR amplification, RNA sequencing, and the reporting of biological pathway activation due to COVID-19 infection, is shown in Figure 1. To assess the DyGAF, we meticulously selected and curated the COVID-19 RNA-seq data sets to ensure their accuracy and relevance across diverse disease instances. The dataset, collected from nasopharyngeal swabs, included samples from individuals with COVID-19 and non-viral conditions. It was then filtered with several criteria: it had to be human-based, non-repetitive, include both control (healthy) and case (condition/disease) samples, and have a sufficient number of samples for robust analysis. The dataset used in this study comprises 169 control samples and 90 COVID-19 samples, covering approximately 19 939 genes. This comprehensive selection process allowed for a thorough and reliable analysis, enhancing the study’s robustness and validity.

Figure 1.

The diagrammatic workflow for data collection (Created with BioRender.com). The diagram illustrates how COVID-19 samples are collected, tested for infection, and processed for RNA sequencing. It also illustrates the subsequent steps involved in aligning the RNA sequences to obtain gene-level count data and identifying the pathways activated in the body of a COVID-19 host.

Data preprocessing

The initial phase involved data preprocessing using advanced bioinformatics and statistical techniques. Transcript abundance was quantified using Kallisto,²³ which processed raw RNA sequence alignment data into the form of read counts, with each count representing the number of sequences mapped to a specific gene. This was followed by the import and summarization of transcript-level estimates into gene-level data using tximport.²⁴ Finally, the Trimmed Mean of M-values (TMM) method,²⁵ integrated into edgeR,²⁶ was applied to normalize gene expression levels across samples before applying them to DyGAF. The variations in library size and composition effects across the RNA-seq datasets were adjusted to ensure that gene expression levels were comparable across different samples.

Differentially expressed genes (DEGs) were identified using edgeR with the raw counts used as the direct input. To control the false discovery rate (FDR) in the DEA, we implemented the Benjamini-Hochberg FDR procedure. Overall, this process adjusted P-values to account for multiple testing, thereby reducing the number of false-positive genes and ensuring statistical significance at an adjusted P-value (adj-P-value) threshold of ⩽.05.

Attention-based feature selection mechanism

We embraced attention mechanisms as a fundamental principle to dynamically prioritize the most relevant features within the expansive genetic landscape, facilitating precise identification of genes critical to disease outcomes.^27,28 Certain genes operate autonomously, exerting direct influence on disease outcomes through their interactions with cellular components. For instance, genes involved in viral entry, replication, or regulation of the immune response can independently affect host responses to infections like COVID-19.^29
-32 Conversely, some genes may function more dependently, with their activities influenced by other genes, environmental factors, or cellular signaling pathways. In the context of COVID-19, dependent gene’s action may involve immune-related genes or cytokine signaling pathways, where disturbances in one gene’s expression can cascade through the network, affecting disease outcomes.²⁸

To capture both independent and dependent gene behaviors, we proposed a combined method, DyGAF, including 2 separate models. DyGAF was initially inspired by the attention models proposed by Bahdanau²⁷ and Shaw et al,³³ but was tailored to biological feature selection rather than sequence modeling. Unlike traditional attention mechanisms that rely solely on additive scoring or scaled dot-product attention, DyGAF employed element-wise multiplication (Model A) for independent feature weighting and dot-product transformations (Model B) to capture interdependencies, ensuring alignment with gene regulatory mechanisms.

Model A (independent gene functions analysis)

Model A focuses on the independent roles of genes, evaluating each gene’s individual effect on disease dynamics without considering its interactions with other genes or environmental factors. It uses a custom attention layer, developed with TensorFlow and Keras, to effectively learn and assign weights, thereby calculating attention scores for each gene as illustrated in Figure 2A.^32,34 Model A’s attention mechanism ignored the interdependencies among input data elements. It began with an input matrix X, where each element was denoted as x_ij, indicating its position within the matrix. The model proceeded to calculate the attention scores (α) through the application of a hyperbolic tangent function (tanh), which acted on the element-wise product of the input matrix and a weight matrix W (denoted as w_ij for each element), with an added bias b:

α = \tanh (X ⊙ W + b)

(1)

Following this, the model computed the attention weights (β) by again applying the tanh function to α:

β = \tanh (α)

(2)

These attention weights were then applied to the input matrix X through another round of element-wise multiplication to generate a weighted output matrix:

W_{o u t p u t} = β ⊙ X

(3)

To ensure operational uniformity and output consistency across inputs and batches, we introduced a zero-filled matrix P, with a predefined size. The matrix addition is followed by a division operation that normalizes the weighted output. Let $W_{A}$ denotes the final weighted output for Model A:

W_{A} = \frac{P + W_{o u t p u t}}{2^{3}}

(4)

To ensure stable feature scaling, we applied power-of-two normalization, empirically testing divisors $2^{k},$ where k = 1, 2, 3, . . ., n), with k = 3 providing the best stability and interpretability. This prevents extreme variations in attention-weighted values, maintains balanced feature importance scores, and facilitates subsequent interpretation and analysis. Unlike quantization, our weights remain in floating-point format, and it ensures stability and interpretability of the model’s outputs.³⁵

Figure 2.

Fundamental attention mechanisms in neural network architecture. (A) Illustration of 2 methods for calculating attention weights in Models A and B. This panel highlights the steps that define the models’ specific independent or dependent natures, using element-wise multiplication and dot products, respectively, each followed by bias addition to compute the final attention scores. (B) This panel illustrates how attention weights are combined to calculate feature importance, emphasizing their computational steps and influence on input relevance.

Through all this steps, Model A carefully assessed the importance of each feature in the dataset, enabling a detailed, feature-focused analysis of biological data. This approach prioritizes examining individual feature contributions over analyzing how features interact with each other.

Model B (dependent gene functions analysis)

To find out how genes interact within complex regulatory networks and affect each other’s expression, Model B used gene expression data to infer gene interdependencies through attention mechanism. It examined genes expression with the objective of capturing the intricate interplay between genes and identifying key features (genes) that collectively contribute to the progression of diseases.³⁶ The dependent model adopted a more complex attention mechanism that acknowledged the relational dynamics among input data elements as shown in Figure 2A. This model began with the input matrix X and computed initial attention scores (α′) using a sigmoid function (σ) applied to a hyperbolic tangent activated linear transformation of X. This transformation was performed through a dot product between X and a weight matrix W, with an added bias b:

α^{’} = σ (\tanh (XW + b))

(5)

Here, the bias term b is broadcasted and added to the result of the dot product, ensuring each element of the resulting matrix is adjusted by the corresponding bias value, which enhances the model’s flexibility and learning capacity.³⁷ Subsequently, the model computed a weighted output by multiplying the mean-reduced attention scores across a specified axis (in this case, axis = 0, which refers to the features dimension in the data matrix) with the input matrix X, through an element-wise multiplication:

W_{o u t p u t_1} = m e a n (α^{'}, a x i s = 0) ⊙ X

(6)

Following this, the model recalculated attention scores (α′′) using a sparsemax activation function applied to a dot product of the previously obtained weighted output $, W_{o u t p u t_1},$ and a weight matrix W, with an added bias b. The newly computed attention scores, α′′, were then used to update the weighted output through a similar process as before:

W_{o u t p u t_2} = m e a n (α^{"}, a x i s = 0) ⊙ X

(7)

A zero-filled matrix P′, dynamically sized according to the last dimension of X, was used to ensure dimensionality alignment during matrix addition, followed by normalization. Let $W_{B}$ denote the final weighted output for Model B:

W_{B} = \frac{P^{'} + W_{o u t p u t_2}}{2^{3}}

(8)

Model B aims to capture the nuanced dynamics of gene regulatory networks, offering insights into genes’ interconnected roles in disease contexts. The architecture enhanced interpretability by returning attention weights, facilitating a focused analysis of critical genetic factors in disease susceptibility and progression.

Model training, experimentation, and evaluation

In the process of model training, experimentation, and evaluation, parameters and hyperparameters were tuned for both Model A and B. Our dataset, representative of the problem domain, underwent a validation split of 20%. This step serves as a crucial step for monitoring the model’s performance on unseen data, thereby mitigating the risk of overfitting. Hyperparameter tuning, a vital phase of model training, was conducted with varying batch sizes (8, 16, 32, 64) over 100 epochs. However, it was observed that a batch size of 16 yielded state-of-the-art results. This iterative process is pivotal in identifying the optimal configuration that balances training efficiency and model performance. We implemented the Adam optimizer through TensorFlow and Keras, known for managing large datasets and high-dimensional spaces by dynamically adjusting learning rates. In addition, when coupled with the binary cross-entropy loss function, this setup provided a robust framework for training the attention mechanism model.

We also implemented multi-head attention models, leveraging the core principles of attention mechanisms, inspired by the seminal work of Ashish Vaswani et al.²⁸ Each model processes input independently across 2, 4, 6, or 8 heads, with outputs either concatenated or used directly, then flattened and passed through a sigmoid-activated dense layer. Our experiments showed that the model with 4 heads most effectively balances performance and interpretability, with each head containing 239, 269 trainable parameters. The outputs from all attention heads were concatenated, forming a comprehensive representation that underpins the model’s decision-making process. To validate the robustness of our findings, we also implemented k-fold cross-validation, with k equal to 5. By rotating through each subset as the testing set, we obtained multiple evaluations of the model’s performance across different data segments. This technique provided a more accurate estimation of model performance across data segments and reduced data partitioning bias.

We compared our model’s results to DEA analysis and the RF-based feature selection method. This comparison shed light on our model’s effectiveness in capturing genetic factors relevant to the disease. By integrating machine learning interpretability and deep learning sophistication, our model surpassed the scope of DEGs analysis, offering refined biomarker identification.

Combination of attention-based weighted values and RF-based feature selection in the DyGAF method

The amalgamation of attention weights from the independent (Model A) and dependent (Model B) frameworks constituted the foundation of our investigative process (Figure 2B). Acknowledging the intricate nature of biological pathways at play, our methodology employed custom scoring functions to refine the synthesis of attention weights, thereby guaranteeing a thorough appraisal of gene relevance.³⁸ In bioinformatics and functional genomics, where determining gene significance is crucial, the dual-model system offers a robust framework for integrating and evaluating gene importance. Specifically, given a data matrix X with i instances (samples) and j features (genes), along with 2 vectors of normalized weighted value W_A and W_B corresponding to each gene’s weighted values derived from Model A and B, respectively.

Formally, for each gene j, the combined score or weighted value $(w_{j})$ is given by:

w_{j} = 2 {(\frac{\min (w_{A j}, w_{B j}) + (w_{A j} . w_{B j})}{2})}^{2}

(9)

Here, by taking the minimum of $w_{A j}$ and $w_{B j},$ the method emphasizes a conservative assessment of feature importance, ensuring that the features deemed less influential by both models were kept low in the final ranking. This is crucial in scenarios where the avoidance of false positives is paramount. In addition, the product of $w_{A j}$ and $w_{B j}$ highlighted genes that both models considered important, promoting a unified assessment of a gene’s importance. Averaging these 2 components balances the approach, preventing either conservative or aggressive estimates from dominating. The squaring operation disproportionately increased the influence of higher scores, prioritizing features that both models agreed on strongly. Multiplying by 2 further scales the finally combined weighted value $(w_{j})$ ensuring that the score distinctly reflects both concurrence on high importance and agreement on lower importance. Such a methodology is particularly effective in complex fields like genomics, where uncovering the subtle yet pivotal roles of genes necessitates a nuanced approach. It facilitates a deeper understanding of gene function and enhances interpretability.

Finally, we calculated the weighted values by multiplying the weights with the gene expression. On top of the final weighted value, we used a feature selection technique based on RF.³⁹ Overall, the technique of using random forest on the analyzed integrated weighted values for gene selection is named DyGAF. The RF algorithm was highly regarded for effectively identifying key features crucial for distinguishing between 2 classes. We also evaluated established combination metrics, including Geometric Mean, Borda Count, and Harmonic Mean, using RF for classification effectiveness and Mutual Information (MI) for information retention. As demonstrated in Table S1 in the Supplemental File 1, our custom metric outperforms traditional methods in both accuracy and information preservation. This methodological approach improved the reliability of our feature selection, ensuring accurate identification of key genes associated with COVID-19.

COVID-19 classifier

DyGAF is also used to classify COVID-19 patients from healthy individuals by analyzing gene expression profile. To assess the classification efficacy of DyGAF, we employed RF, support vector machines (SVM), and K-nearest neighbors (KNN) on raw data to differentiate COVID-19 samples from healthy ones. Total 5 models compared in our analysis: DyGAF, RF, DyGAF-SVM, DyGAF-KNN, and DEA-RF. The models were specifically applied to features (eg, genes) identified by the DyGAF method. In contrast, the RF model was applied to all genes, and the DEA-RF model used RF to classify samples based on genes deemed significant by DEA. During this classification, sensitivity was prioritized to ensure that no COVID-19-positive samples were misclassified as healthy.

GO and pathways analysis

We made use of GO, KEGG, and WikiPathways, which are among the most helpful resources available for annotating gene functions, investigating biological pathways, and comprehending the intricate connections that exist between genes and molecules. An enrichment analysis web tool, Enrichr (https://maayanlab.cloud/Enrichr/), was used to carry out the functional analysis (GO, KEGG, and Wiki) of the most significant genes list.⁴⁰ A functional annotation would be statistically significant if the adjusted P-value $\leq . 05$ .

Protein-protein interaction and hub proteins

To unravel the underlying protein interaction networks encoded by the COVID-19 dataset, we used the STRING database ( http://string-db.org ) (version 12.0), a comprehensive resource that amalgamates known and predicted protein-protein interactions.⁴¹ The top 100 significant genes used to construct the protein-protein interaction (PPI) network via STRING. We employed Cytoscape ( https://cytoscape.org/ ) to visualize the complex landscape of protein interactions.⁴² To pinpoint the hub genes within the network, we also employed the CytoHubba plugin, a renowned tool for identifying key nodes in biological networks based on their topological features.⁴³ The top 10 hub genes formed a high-centrality subnetwork, suggesting their roles are likely crucial to COVID-19’s pathogenesis due to their PPI network activities. Four different methods availed in CytoHubba, such as Maximum Neighborhood Component (MC), Degree, Betweenness, and Closeness, were used to rank the important genes in our analysis.

TF-gene regulatory network

In the context of our study, we identified a set of significant genes whose expression is overseen by transcription factors (TFs). To determine the regulation between TFs and target genes, we used NetworkAnalyst (version 3.0), a web tool for biological analysis.⁴⁴ Specifically, we analyzed the top 100 significant genes identified by the attention-based model to elucidate the regulatory networks. We selected the JASPAR⁴⁵ and ChEA⁴⁶ databases, which are widely accepted resources for TF-gene interactions. Finally, Cytoscape ( https://cytoscape.org/ ) is used to visualize the network.

Results

We have developed a method called DyGAF, a comprehensive framework that offers a nuanced understanding of genomic expression patterns linked to a condition. The process began with the acquisition of expression profiling data, rigorous preprocessing steps including TMM normalization, data cleaning, and transformation; subsequently, the pipeline proceeds to model building to identify and rank biomarker genes associated with COVID-19, providing valuable insights into the molecular signatures of the disease (as shown in Figure 3). In the final stage of the pipeline, a comprehensive molecular analysis of COVID-19 was conducted using bioinformatics tools and databases such as KEGG and WikiPathways for pathway mapping, GOs for functional annotation, protein-protein interaction networks, hub-protein, and TF-gene interactions identification. By leveraging these diverse analytical approaches, a holistic understanding of COVID-19 at the molecular level was achieved, facilitating the identification of key biological mechanisms and potential therapeutic targets.

Figure 3.

Overview of data processing and analysis for COVID-19 gene expression profiling (Created with BioRender.com). (A) Gene expression profiling was carried out on the nasopharyngeal swab data and collected from the Gene Expression Omnibus (GEO) database. (B) Structured data was created on performing Trimmed Mean of M-values (TMM) normalizing, data cleaning, and data transformation. (C) A dual attention-based model was built and trained along with random forest algorithm as a classifier for feature selection. (D) Top features were analyzed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and WIKI Pathway, Gene Ontology (GO) enrichment analysis, protein-protein interaction (PPI) network analysis, hub-protein, and TF-gene interaction.

Data preprocessing

The boxplots in Figure 4 illustrated the distribution of gene expression across samples before (Figure 4A) and after (Figure 4B) TMM normalization. Each boxplot represented the log2-transformed expression values of genes across various samples. On closer inspection of Figure 4B, it is evident that following TMM normalization, the medians of all the samples are centered within a narrower range of 3 to 6. The scaling adjusted the data’s shape during normalization, improving consistency across samples by reducing the effects of library size and composition. This enhances the reliability of downstream analyses, ensuring that biological insights from RNA-seq data reflect true biological differences rather than technical variabilities.

Figure 4.

Comparison of Trimmed Mean of M-values (TMM) normalization: gene expression before and after TMM normalization using log2-transformed of the expression profile. (A) The data before TMM normalization and (B) the data after the TMM normalization.

Gene ranking using attention-based mechanism

The attention-based feature selection technique, DyGAF, was used to rank the genes according to their feature importance. This approach identified 923 genes with feature importance values greater than zero (Supplemental Table 1). By leveraging DyGAF, the model enhanced its capacity to pinpoint critical genetic biomarkers linked to COVID-19, thereby facilitating the discovery of valuable insights into the disease’s underlying biological mechanisms. Specifically, DyGAF highlights genes’ persistent importance and how they affect disease progression. In our analysis, we focused on the top 100 genes to achieve a balance between comprehensiveness and manageability in downstream validation and possible clinical applications. The threshold is flexible, so future studies may include more or less genes based on research or therapeutic goals.

Comparative assessment of the conventional methods (DEA and RF)

In our investigation, we employed the traditional DEA approach to identify potential biomarkers (genes) associated with COVID-19. In addition, we used RF, a prominent feature selection method in machine learning, to perform biomarker identification. Results from both approaches were compared with our new method to showcase the robustness of our model.

Using edgeR, a total of 1702 DEGs for COVID-19 patients were identified from the GSE188678 dataset. These DEGs were filtered based on an adjusted P-value threshold of ⩽.05, and for a comparative downstream study, the top 100 DEGs were selected using a stricter criterion of an adjusted P-value ⩽.005 (Supplemental Table 2). Furthermore, employing the conventional RF-based feature selection approach initially yielded 918 relevant genes in the dataset (see Supplemental Table 3). To visualize the overlap and uniqueness of genes identified by all 3 models, we constructed a Venn diagram depicted in Figure 6A.

COVID-19 classifier performance comparison

In our study, the DyGAF model demonstrated superior performance in classifying COVID-19 samples from nasopharyngeal swabs based on gene expression profiles. It achieved the highest testing accuracy and F1-score as compared to the control models, including DEA-RF, RF, DyGAF-SVM, and DyGAF-KNN (Table 1). Particularly noteworthy was DyGAF’s perfect sensitivity rate, as shown in Figure 5A, indicating its exceptional ability to identify all true positive COVID-19 cases without any false negatives. These findings highlight the potential of DyGAF as a robust tool for diagnostic applications in the context of gene expression analysis. More information about classification can be found in the Additional Results of Supplemental File 1.

Table 1.

Classification report comparing dynamic gene attention focus (DyGAF), random forest (RF), differential expression analysis (DEA-RF), support vector machines (DyGAF-SVM), and K-nearest neighbors (DyGAF-KNN) models based on the gene expression profiles from human nasopharyngeal swabs.

Models	Accuracy		F1-score	Specificity	Sensitivity
Models	Training	Testing	F1-score	Specificity	Sensitivity
DyGAF	100	94.23	96	91.9	100
DyGAF (Model A)	100	88.46	92	100	66.67
DyGAF (Model B)	100	90.38	93	100	72.23
RF	100	88.46	80	85	100
DyGAF-SVM	100	86.53	90	86.5	86.7
DyGAF-KNN	90.82	76.29	84	97	80
DEA-RF	100	92	94	77.78	100

Bold values are the results yielded from the DyGAF method.

Figure 5.

Confusion matrices for the classification of COVID-19 infection in humans based on ranked gene expression profiles, comparing the performance of 4 computational models: (A) dynamic gene attention focus (DyGAF), (B) random forest, (C) support vector machines (DyGAF-SVM), and (D) K-nearest neighbors (DyGAF-KNN).

GO and pathway (KEGG and Wiki) analysis

Furthermore, GO and pathway analyses were conducted on the top 100 genes identified by all 3 methods to validate and select the optimal model. Comparisons were made based on enriched biological processes (BP) yielded from GO analysis, and pathways identified from KEGG pathways, and WikiPathway mapping. Our novel approach successfully identified the highest number of GOs, elucidating the biological processes strongly linked to the progression and severity of COVID-19 in host cells (Figure 6B). Figure 6C to E illustrates the top 20 statistically significant COVID-19-related GO terms identified by DyGAF, DEA, and RF models, respectively. Overall, DyGAF also showed a higher number of overlapping gene sets associated with specific GO terms or pathways compared with DEA and/or RF. Table 2 presents the significant COVID-19-related KEGG pathways with adjusted P-values ⩽.05. Notably, DyGAF identified key pathways such as Coronavirus disease and Cytokine-cytokine receptor interaction with the highest overlapped genes, highlighting its robust capability in pathway analysis. Similarly, Table 3 presents the significant COVID-19-related WikiPathways identified by those 3 models. DyGAF outperformed by identifying the most significant pathways with adjusted P-values ⩽.05 (Figure 6B). Key pathways identified by DyGAF include the Network Map of SARS-CoV-2 Signaling and Type I Interferon Induction and Signaling. These pathways, supported by in vitro and in vivo studies conducted over the past 5 years,⁴⁷ are crucial for COVID-19 diagnosis and medication development.^4,5

Figure 6.

Comparative analysis of COVID-19 genes and gene ontology (GO) terms. (A) Venn diagram illustrating the overlap and unique genes identification by 3 different models: dynamic gene attention focus (DyGAF), differential expression analysis (DEA), and random forests (RF). (B) Venn diagram displaying the shared and unique counts of statistically enriched COVID-19-related GO terms among DyGAF, DEA, and RF models. (C-E) The bubble plots illustrate the top 20 statistically significant COVID-19-related GO terms identified by DyGAF, DEA, and RF, respectively. The size of the bubbles represents the gene count, while the color intensity indicates the level of statistical significance (P-value).

Table 2.

Statistically significant COVID-19-related KEGG pathways for all the 3 methods including dynamic gene attention focus (DyGAF), differential expression analysis (DEA), and random forest (RF).

Term	DyGAF		DEA		RF
Term	Adj P-value	Overlap	Adj P-value	Overlap	Adj P-value	Overlap
Influenza A	9.48e-08	11/172	1.22e-07	11/172	1.53e-06	10/172
Coronavirus disease	1.09e-06	11/232	7.64e-06	10/232	0.00278	7/232
NOD-like receptor signaling pathway	4.10e-04	7/181	9.92e-07	10/181	0.00430	6/181
Viral protein interaction with cytokine and cytokine receptor	0.01155	4/100	0.00220	5/100	0.06979	3/100
Cytokine-cytokine receptor interaction	0.02152	6/295	0.15178	5/295	0.38629	3/295
Complement and coagulation cascades	0.03987	3/85			0.52771	1/85
Toll-like receptor signaling pathway	0.19845	2/104	0.02030	4/104	0.26085	2/104
Chemokine signaling pathway	0.17746	3/192	0.00102	7/192	0.43431	2/192

For each method, the P-values were corrected for multiple comparisons using Benjamini-Hochberg procedure; and overlap indicates the number of genes (from each model) shares with a reference pathway.

Table 3.

Statistically significant COVID-19-related Wiki-pathways for all the 3 methods such as dynamic gene attention focus (DyGAF), differential expression analysis (DEA), and random forest (RF).

Term	DyGAF		DEA		RF
Term	Adj P-value	Overlap	Adj P-value	Overlap	Adj P-value	Overlap
Network map of SARS CoV 2 signaling pathway WP5115	1.10e-20	22/218	1010e-13	17/218	9.05e-14	17/218
SARS CoV 2 innate immunity evasion and cell-specific immune response WP5039	1.85e-09	9/66	5.46e-11	10/66	1.27e-06	7/66
Type I interferon induction and signaling during SARS CoV 2 infection WP4868	5.26e-09	7/31	4.78e-09	7/31	3.90e-07	6/31
Host pathogen interaction of human coronaviruses interferon induction WP4880	2.55e-07	6/32	2.49e-07	6/32	1.34e-05	5/32
SARS coronavirus and innate immunity WP4912	2.04e-04	4/30	0.00505	3/30	2.39e-04	4/30
Extrafollicular and follicular B cell activation by SARS CoV 2 WP5218	0.00505	4/74	0.04028	3/74	0.42799	1/74
Mitochondrial immune response to SARS CoV 2 WP5038	0.00505	3/32	0.00505	3/32	0.00545	3/32
host pathogen interaction of human coronaviruses MAPK signaling WP4877	0.00672	3/36	4.43e-04	4/36	4.46e-04	4/36
mRNA vaccine activation of dendritic cell and induction of IFN 1 WP5187	0.00843	2/10	2.26e-04	3/10	2.39e-04	3/10

For each method, the P-value is corrected for multiple comparisons using Benjamini-Hochberg procedure; and overlap indicates the number of genes (from each model) shares with a reference pathway.

Moreover, the supplemental tables provide a comprehensive view by including complete GOs and pathway analyses, covering all relevant biological processes. Supplemental Tables 4 through 6 present the results of enriched biological processes identified through GO analysis, and pathways through KEGG, and WikiPathway mapping using the top 100 genes identified by the DyGAF. Supplemental Tables 7 through 9 show similar analyses results for the top 100 genes determined through DEA. Finally, Supplemental Tables 10 through 12 contain the pathway analysis outcomes for the top 100 genes identified using the RF approach. Finally, identified GOs and pathway’s relevance with COVID-19 are discussed in Supplemental File 1.

PPI and hub proteins analysis for biomarker gene identification

To uncover direct molecular interactions, we imported the top 100 genes identified by the DyGAF into the STRING database to contract a relevant PPI network for COVID-19. Various parameters were applied to ensure the specificity and reliability of the interactions, including Network Type, Interaction Sources, and Minimum Required Interaction Score. To ensure the highest confidence in the interactions, a threshold was set at 0.900, exclusively including interactions with the most robust empirical support. The PPI network was constructed with 28 genes and 46 edges using Cytoscape, as shown in Figure 7A. Furthermore, CytoHubba was used to filter the top 10 hub proteins shared by ranking algorithms such as maximum neighborhood component (MCC), Degree, Betweenness, and Closeness. These hub proteins, which play crucial roles in the network, were identified as CXCL13, DDX58, DHX58, OAS1, OAS3, ASF1A, OAS2, H4C14, H3C13, and H2AC4 (Figure 7B). Further discussion of this result can be found in the Additional Discussions of Supplemental File 1.

Figure 7.

Key protein-protein interactions (PPI) and hub proteins with COVID-19. (A) PPI retrieve from STRING database using the top 100 significant genes. (B) Hub-proteins found out using different algorithms available in cytoHubba.

Transcriptional regulation analysis

The analysis of TF-gene regulatory network using the JASPAR and ChEA databases provided a critical approach to understanding the complex regulatory mechanisms of COVID-19 pathogenesis and host response. Figure 8A and B depict TF-gene networks that regulate the expression of specific genes in COVID-19 patients. The TFs identified, as shown in dark-olive-green nodes, include AR, CREB1, E2F1, FLI1, FOXC1, FOXL1, GATA2, GATA3, HNF4A, IRF8, IRF9, JUN, MEF2A, MYC, NFIC, NFKB1, NFYA, POU2F2, POU5F1, PPARG, RELA, RUNX1, SOX2, SPI1, SREBF1, STAT3, TEAD1, TFAP2A, TP63, USF2, and YY1, whereas the pink nodes represent target genes that are non-TF. In COVID-19 patients, the identified TFs such as NFKB1 (or NF-κB1 in some literature) and RELA (or RelA in some literature) are central to activating genes that drive the inflammatory response, exacerbating severe symptoms and acute respiratory distress syndrome (ARDS). Targeting these can help modulate excessive inflammation.⁴⁸ SARS-CoV-2-induced enhancement of STAT3 activity, coupled with STAT1 inhibition, modulates the expression and activity of various genes and proteins involved in immune cell differentiation, cytokine production, and inflammation, playing a crucial role in the severe immune dysregulation observed in COVID-19.^49,50 Similarly, IRF8, a TF activated by IFN-γ and TLR signaling, regulates immune cells and the expression of key inflammatory cytokines.⁵¹ This analysis of the TF-gene regulatory network highlighted important TFs and their target genes involved in COVID-19 pathogenesis and host response, providing potential targets for therapeutic intervention and diagnostic biomarkers.

Figure 8.

TF-gene network identified using Network Analyst. (A) Visualization of the TF-gene network generated with JASPAR. (B) Visualization of the TF-gene network generated with ChEA database. Dark-olive-green ellipses represent TFs, while pink ellipses represent target genes (non-TF).

Discussion

Although significant research has investigated the genetic basis of different diseases and their interactions with human cells, a crucial gap persists in computational methods capable of retrieving the maximum number of biomarker genes while establishing their biological relevance. This is critical for understanding the variability in a disease such as COVID-19, particularly regarding severity and outcomes (Figure 1). Our study introduced a novel attention-based dynamic dual-model method, termed DyGAF, to identify and rank the most significant genes in response to COVID-19 infection. DyGAF facilitates comparative gene-expression analysis between healthy individuals and COVID-19 patients. In our study, we used samples collected from nasopharyngeal swabs of COVID-19 patients and healthy individuals (Figure 3A), followed by preprocessing, modeling and analysis (Figure 3B). The DyGAF model demonstrated superior accuracy compared with traditional methods such as DEA and RF feature selection (Table 1 and Figure 5). This heightened accuracy can be attributed to several key principles inherent to the DyGAF approach. First, DyGAF uses a dual-model framework that integrates both independent and dependent analyses of gene significance. Model A focused on the individual contributions of each gene, ensuring the identification of genes with strong independent signals. Model B, on the other hand, examines genes within their broader regulatory networks by analyzing gene expression, capturing gene interactions and their collective roles of genes. Both models incorporated attention mechanisms, collectively forming DyGAF, which enhances its ability to prioritize the most relevant genes. This dynamic focus enhances the interpretability and relevance of the identified genes, further contributing to the DyGAF’s superior performance by combining final weights for each gene using a new metric. Furthermore, empirical results, including higher MI scores and improved classification accuracy, validate DyGAF’s efficacy in delivering a more informative and accurate analysis compared with other methods (as shown in Table S1 and S2 within Supplemental File 1). This comprehensive approach enables DyGAF to leverage the strengths of both independent and dependent evaluations, leading to a more nuanced and precise identification of significant genes.

Through the application of DyGAF, we have identified several novel insights into the genetic mechanisms underlying COVID-19. Notably, DyGAF directed the identification of unique GOs that were not highlighted by traditional methods. For instance, the identification of GOs such as “Regulation of Monocyte Chemotactic Protein-1 Production” and “Positive Regulation of T Cell Cytokine Production” provides new avenues for understanding the immune response in COVID-19. These novel findings underscore the ability of DyGAF to uncover previously unrecognized gene interactions and pathways that play crucial roles in disease progression and severity. Furthermore, DyGAF’s holistic approach to analyzing gene networks allows for the identification of key regulatory nodes within these networks. This insight is critical for developing targeted therapeutic interventions and improving our understanding of the complex biological processes involved in COVID-19 or other infectious diseases. More discussion can be found in the Additional Discussions of Supplemental File 1. The discovery of these novel GOs and pathways highlighted the potential of DyGAF’s contribution toward the field of genomics and infectious disease research.

DyGAF also identified a key pathway called “Coronavirus Disease” in KEGG with greater statistical significance and a higher number of enriched genes (Table 2). Moreover, both DyGAF and DEA identified a greater number of COVID-19-related pathways in the WikiPathways database compared with RF (Table 3). The identified pathways, such as the “Network Map of SARS-CoV-2 Signaling Pathway” and “SARS-CoV-2 Innate Immunity Evasion and Cell-Specific Immune Response,” provided detailed insights on how the virus evades host immune defenses and triggers the subsequent cellular responses.⁵² Further details about these pathways can be found in Additional Discussions of Supplemental File 1. In addition, the protein-protein interaction (PPI) analysis using DyGAF biomarkers identified crucial hub proteins such as histones,⁵³ ASF1A,⁵⁴ OAS enzymes,⁵⁵ DDX58,⁵⁶ and CXCL13,⁵⁷ which play significant roles in the host’s immune response to SARS-CoV-2. Further discussions can be found in Supplemental File 1.

The accuracy and reliability of DyGAF are contingent on the availability of comprehensive genomic data. DyGAF uses raw-read counts gene expression, which can be obtained a specific host species of interest, making it widely applicable. Its efficiency can be further amplified with increased data size and quality. However, the computational complexity of DyGAF, particularly with the integration of attention mechanisms and dual-model frameworks, can be resource-intensive, necessitating significant computational power and expertise. The other limitation is the potential for overfitting, particularly when dealing with small sample sizes. Although we have implemented various strategies to mitigate overfitting, such as cross-validation, and hyperparameter tuning, the risk cannot be entirely eliminated. Future work should focus on refining these techniques and exploring additional methods to further reduce the risk of overfitting. Finally, while DyGAF provides valuable insights into gene interactions and regulatory networks, the interpretation of these results requires careful consideration. The biological relevance of identified genes and pathways must be validated through experimental studies, and the potential for false positives or negatives should be critically assessed.

Conclusions

In conclusion, the DyGAF model offers a robust and innovative approach to gene significance analysis, particularly in identifying biomarkers of infectious diseases within complex genomic data. In addition, DyGAF’s superior diagnostic performance, characterized by exceptional sensitivity, underscores its importance in disease outbreak management by ensuring the rapid detection of causative pathogens for timely intervention. By leveraging gene GO enrichment analysis and pathway studies, DyGAF effectively identified essential genes and outlined the host-pathogen interactions, emphasizing the critical roles of host defense and viral evasion mechanisms. The analyses revealed pivotal biological processes, including interferon signaling, cytokine production, and immune regulation. These mechanisms are crucial for therapeutic targeting to enhance viral clearance and reduce inflammatory responses. In addition, the model’s identification of hub proteins highlights their roles in processes such as NET formation and cytokine signaling modulation, pinpointing potential targets for mitigating viral replication and managing immune responses. Overall, DyGAF provides a solid foundation for future research, guiding the development of more effective diagnostics, treatments, and preventive strategies against infectious diseases.

Supplemental Material

sj-docx-1-bbi-10.1177_11779322251325390 – Supplemental material for Dynamic Gene Attention Focus (DyGAF): Enhancing Biomarker Identification Through Dual-Model Attention Networks

Supplemental material, sj-docx-1-bbi-10.1177_11779322251325390 for Dynamic Gene Attention Focus (DyGAF): Enhancing Biomarker Identification Through Dual-Model Attention Networks by Md Khairul Islam, Himanshu Wagh and Hairong Wei in Bioinformatics and Biology Insights

Supplemental Material

sj-xlsx-2-bbi-10.1177_11779322251325390 – Supplemental material for Dynamic Gene Attention Focus (DyGAF): Enhancing Biomarker Identification Through Dual-Model Attention Networks

Supplemental material, sj-xlsx-2-bbi-10.1177_11779322251325390 for Dynamic Gene Attention Focus (DyGAF): Enhancing Biomarker Identification Through Dual-Model Attention Networks by Md Khairul Islam, Himanshu Wagh and Hairong Wei in Bioinformatics and Biology Insights

Footnotes

ORCID iDs

Md Khairul Islam

Hairong Wei

Statements and Declarations

Supplemental Material

Supplemental material for this article is available online.

References

Cevik

Kuppalli

Kindrachuk

Peiris

Virology, transmission, and pathogenesis of SARS-CoV-2. BMJ. 2020;371:m3862.

Bridges

Vladar

Huang

Mason

RJ.

Respiratory epithelial cell responses to SARS-CoV-2 in COVID-19. Thorax. 2022;77:203-209.

Karki

Kanneganti

TD.

The “cytokine storm”: molecular mechanisms and therapeutic prospects. Trends Immunol. 2021;42:681-705.

Farahani

Niknam

Mohammadi Amirabad

, et al. Molecular pathways involved in COVID-19 and potential pathway-based therapeutic targets. Biomed Pharmacother. 2022;145:112420.

Paludan

Mogensen

TH.

Innate immunological pathways in COVID-19 pathogenesis. Sci Immunol. 2022;7:eabm5505.

Potere

Batticciotto

Vecchié

, et al. The role of IL-6 and IL-6 blockade in COVID-19. Expert Rev Clin Immunol. 2021;17:601-618.

Albright

Mick

Sanchez-Guerrero

, et al. A 2-gene host signature for improved accuracy of COVID-19 diagnosis agnostic to viral variants. mSystems. 2023;8:e0067122.

Martinez-Diz

Morales-Álvarez

Garcia-Iglesias

, et al. Analyzing the role of ACE2, AR, MX1 and TMPRSS2 genetic markers for COVID-19 severity. Hum Genomics. 2023;17:50.

Peterson

Baran

Bhattacharya

, et al. Gene expression risk scores for COVID-19 illness severity. J Infect Dis. 2023;227:322-331.

10.

Eslami

Jalili

The role of environmental factors to transmission of SARS-CoV-2 (COVID-19). Amb Express. 2020;10:92.

11.

Mwiinde

Siankwilimba

Sakala

Banda

Michelo

Climatic and environmental factors influencing COVID-19 transmission—an African perspective. Trop Med Infect Dis. 2022;7:433.

12.

Love

Huber

Anders

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550-521.

13.

Tibshirani

Regression shrinkage and selection via the lasso. J R Stat Soc B Stat Method. 1996;58:267-288.

14.

Wold

Esbensen

Geladi

Principal component analysis. Chemometr Intell Lab Syst. 1987;2:37-52.

15.

Kaufman

Rousseeuw

PJ.

Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley; 2009.

16.

Arslan

Machine learning methods for covid-19 prediction using human genomic data. Proceedings. 2021;74:20.

17.

Krishanthi

Jayetileke

Liu

Wang

Y-G.

Enhancing feature selection optimization for COVID-19 microarray data. COVID. 2023;3:1336-1355.

18.

Potamias

Gkoublia

Kanterakis

The two-stage molecular scenery of SARS-CoV-2 infection with implications to disease severity: an in-silico quest. Front Immunol. 2023;14:1251067.

19.

Singh

Choudhari

Nema

Khan

AA.

ACE2 and TMPRSS2 polymorphisms in various diseases with special reference to its impact on COVID-19 disease. Microb Pathog. 2021;150:104621.

20.

Kaforou

Rodriguez-Manzano

, et al. Discovery and validation of a three-gene signature to distinguish COVID-19 and other viral infections in emergency infectious disease presentations: a case-control and observational cohort study. Lancet Microbe. 2021;2:e594-e603.

21.

Kousathanas

Pairo-Castineira

Rawlik

, et al. Whole-genome sequencing reveals host factors underlying critical COVID-19. Nature. 2022;607:97-103.

22.

Cano-Gamez

Burnham

Goh

, et al. An immune dysfunction score for stratification of patients with acute infection based on whole-blood gene expression. Sci Transl Med. 2022;14:eabq4433.

23.

Bray

Pimentel

Melsted

Pachter

Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol. 2016;34:525-527.

24.

Soneson

Love

Robinson

MD.

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 2015;4:1521.

25.

Robinson

Oshlack

A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25-29.

26.

Robinson

McCarthy

Smyth

GK.

EdgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139-140.

27.

Bahdanau

Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:14090473, 2014.

28.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:1-11.

29.

Zhang

Niu

ACE2 and COVID-19 and the resulting ARDS. Postgrad Med J. 2020;96:403-407.

30.

Ascierto

Wei

IL-6 modulation for COVID-19: the right patients at the right time. J Immunother Cancer. 2021;9:e002285.

31.

Bizzotto

Sanchis

Abbate

, et al. SARS-CoV-2 infection boosts MX1 antiviral effector in COVID-19 patients. iScience. 2020;23:101585.

32.

Singh

Lanchantin

Sekhon

Attend and predict: understanding gene regulation by selective attention on chromatin. Adv Neural Inf Process Syst. 2017;30:6785-6795.

33.

Shaw

Uszkoreit

Vaswani

Self-attention with relative position representations. arXiv preprint arXiv:180302155, 2018.

34.

Škrlj

Džeroski

Lavrač

Petkovič

Feature importance estimation with self-attention networks. arXiv preprint arXiv:200204464, 2020.

35.

Przewlocka-Rus

Kryjak

. Power-of-two quantized YOLO network for pedestrian detection with dynamic vision sensor. In: 2023 26th Euromicro conference on digital system design (DSD), Golem, Albania, 6-8 September 2023.

36.

Arik

SÖ

Pfister

. Tabnet: attentive interpretable tabular learning. Proc AAAI Conf Artif Intell. 2021;35:6679-6687.

37.

Kong

Munoz Medina

A unified fast gradient clipping framework for DP-SGD. Adv Neur Inf Proc Sys. 2024;36:1-12.

38.

Novakovsky

Dexter

Libbrecht

Wasserman

Mostafavi

Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet. 2023;24:125-137.

39.

Utkin

Konstantinov

AV.

Attention-based random forest and contamination model. Neur Net. 2022;154:346-359.

40.

Kuleshov

Jones

Rouillard

, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90-W97.

41.

Szklarczyk

Gable

Nastou

, et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49:D605-D612.

42.

Saito

Smoot

Ono

, et al. A travel guide to Cytoscape plugins. Nat Methods. 2012;9:1069-1076.

43.

Chin

Chen

Lin

CY.

CytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(suppl 4):S11-S17.

44.

Zhou

Soufan

Ewald

Hancock

Basu

Xia

NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res. 2019;47:W234-W241.

45.

Khan

Fornes

Stigliani

, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 2018;46:D260-D266.

46.

Lachmann

Krishnan

Berger

Mazloom

Ma’ayan

ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics. 2010;26:2438-2444.

47.

Aboul-Fotouh

Mahmoud

Elnahas

Habib

Abdelraouf

SM.

What are the current anti-COVID-19 drugs? From traditional to smart molecular mechanisms. Virol J. 2023;20:241.

48.

Gudowska-Sawczuk

Mroczko

The role of nuclear factor Kappa B (NF-κB) in development and treatment of COVID-19. Int J Mol Sci. 2022;23:5283.

49.

Matsuyama

Kubli

Yoshinaga

Pfeffer

Mak

TW.

An aberrant STAT pathway is central to COVID-19. Cell Death Diff. 2020;27:3209-3225.

50.

Jafarzadeh

Nemati

Jafarzadeh

Contribution of STAT3 to the pathogenesis of COVID-19. Micro Pathog. 2021;154:104836.

51.

Kong

Anderson

Lee

, et al. Cutting edge: autoantigen Ro52 is an interferon inducible E3 ligase that ubiquitinates IRF-8 and enhances cytokine expression in macrophages. J Immunol. 2007;179:26-30.

52.

Kasuga

Zhu

Jang

Yoo

JS.

Innate immune sensing of coronavirus and viral evasion strategies. Exp Mol Med. 2021;53:723-736.

53.

Hong

Yang

Zou

, et al. Histones released by NETosis enhance the infectivity of SARS-CoV-2 by bridging the spike protein subunit 2 and sialic acid on host cells. Cell Mol Immunol. 2022;19:577-587.

54.

Wei

Alfajaro

DeWeirdt

, et al. Genome-wide CRISPR screens reveal host factors critical for SARS-CoV-2 infection. Cell. 2021;184:76-91.

55.

Banday

Stanifer

Florez-Vargas

, et al. Genetic regulation of OAS1 nonsense-mediated decay underlies association with COVID-19 hospitalization in patients of European and African ancestries. Nat Gen. 2022;54:1103-1116.

56.

Vanderboom

Mun

D-G

Madugundu

, et al. Proteomic signature of host response to SARS-CoV-2 infection in the nasopharynx. Mol Cell Proteomics. 2021;20:100134.

57.

Perreau

Suffiotti

Marques-Vidal

, et al. The cytokines HGF and CXCL13 predict the severity and the mortality in COVID-19 patients. Nat Comm. 2021;12:4888.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB

1.28 MB