Abstract
Objective
To investigate the signature genes of fatty acid metabolism and their association with immune cells in pulmonary arterial hypertension (PAH).
Methods
Fatty acid metabolism-related genes were obtained from the GeneCards database. In this retrospective study, a PAH-related dataset was downloaded from the Gene Expression Omnibus database and analyzed to identify differentially expressed genes (DEGs). Weighted gene co-expression network analysis (WGCNA) and machine learning algorithms, including least absolute shrinkage and selection operator (LASSO) and random forest, were used to identify the signature genes. Diagnostic efficiency was assessed by receiver operating characteristic (ROC) curve analysis and a nomogram. Immune cell infiltration was subsequently classified using CIBERSORT.
Results
In total, 817 DEGs were screened from the GSE33463 dataset. The data were clustered into six modules via WGCNA, and the MEdarkred module was significantly related to PAH. The LASSO and random forest algorithms identified five signature genes: ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1. The areas under the ROC curves of these signature genes were 0.917, 0.934, 0.947, 0.963, and 0.940, respectively. CIBERSORT suggested these signature genes may participate in immune cell infiltration.
Conclusions
ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1 show remarkable diagnostic performance in PAH and are involved in immune cell infiltration.
Keywords
Introduction
Pulmonary arterial hypertension (PAH) affects nearly 1% of the global population and is defined by a mean pulmonary arterial pressure > 20 mmHg at rest as assessed by right heart catheterization.1,2 The symptoms of PAH are non-specific and are mainly related to progressive right ventricle dysfunction as a consequence of progressive pulmonary vasculopathy. Although the pulmonary hemodynamics, exercise tolerance, and quality of life of PAH patients have improved considerably with advances in diagnosis and treatment, PAH remains an incurable disease.3,4 Therefore, considering that existing treatment options can only decelerate the progression of PAH without preventing or reversing it, exploring the causes of PAH heterogeneity based on molecular typing and identifying corresponding biomarkers is an effective strategy to achieve the goal of precision medicine.
PAH is a disease characterized by metabolic dysregulation, and abnormalities in fatty acid metabolism have been observed in PAH patients.5,6 Recently, Chen et al. revealed dysregulation of lipid metabolism in PAH patients compared with that in healthy subjects. Lipid metabolism and fatty acid oxidation metabolites were found to be closely associated with PAH. 7 In addition, there is evidence of increased lipid deposition in both the lung parenchyma and the pulmonary vasculature of patients with PAH. 8 Previous studies have shown that circulating free fatty acids and long-chain acylcarnitines are enhanced in PAH patients compared with those in controls. 9 Zhao et al. observed increased accumulation of free fatty acid products in PAH tissues compared with those from control lungs. 10 They also demonstrated an increase in omega-oxidation in fatty acids and upregulation of lipid oxidation in PAH lungs. 10 Recent studies have also demonstrated upregulation of fatty acid uptake and processing and β-oxidation-related genes in idiopathic PAH patients. 11
In this study, we compared differentially expressed fatty acid metabolism-related genes (FAMRGs) and their immune characteristics between PAH patients and controls. Machine learning algorithms were used to identify the signature genes. The diagnostic model was established using a nomogram and receiver operating characteristic (ROC) curves. The results of our study provide insight into the use of fatty acid metabolism as a therapeutic target for PAH.
Materials and methods
Data source
In this retrospective study, the PAH-related GSE33463 dataset and the corresponding sample grouping information were downloaded from the Gene Expression Omnibus database (https://www.ncbi.nlm.nih.gov/geo/). Among the GSE33463 data, 71 samples (PAH: control = 30: 41) were used for analysis. The study was approved by the Ethics Committee of Yuyao People’s Hospital of Zhejiang Province and was performed in accordance with the Declaration of Helsinki. All patient details have been de-identified. This is a retrospective study, and patient information was obtained from a publicly available database (GEO database); therefore, informed consent was not required. The reporting of this study conforms to the STROBE guidelines. 12 In addition, 1205 FAMRGs were obtained from the GeneCards database (https://www.genecards.org/) using the keyword “fatty acid metabolism” (Supplementary Table 1).
Identification of differentially expressed genes (DEGs)
Using the limma package 13 of the R software program (www.r-project.org), DEGs between PAH patients and controls were analyzed according to the following criteria: p < 0.05 and |fold change| > 0.5. A volcano plot was generated to display the DEGs, and the top 50 upregulated and top 50 downregulated DEGs were displayed using a heatmap.
Weighted gene co-expression network analysis (WGCNA)
Based on the scale-free topology criterion, the co-expression network in the GSE33463 cohort was constructed using WGCNA. The pickSoftThreshold function of the WGCNA package 14 was used to calculate the soft threshold power and adjacencies. The adjacency matrix was then converted into a topological overlap matrix, and the corresponding dissimilarity was calculated to perform hierarchical clustering analysis. The dynamic tree cutting method with a minimum module size of 20 was used to identify the co-expressed gene modules. We then measured the association between the gene modules and PAH patients via gene significance values and module membership values and finally identified the key modules.
Signature gene identification and functional enrichment analysis
Core DEGs were identified by the intersection of DEGs, FAMRGs, and key module genes and visualized using the ggvenn package. Subsequently, Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis of the core DEGs were performed with the clusterProfiler package,15–18 using the ggplot2 package for display.
Two machine learning algorithms, least absolute shrinkage and selection operator (LASSO) and random forest, were applied to screen signature genes. LASSO analysis was implemented using the glmnet package with penalty parameters for 10-fold cross-validation. 19 In addition, the random forest package was used to classify core DEGs. The random forest model determined the optimal number of variables by calculating the average error rate of core DEGs. 20 We then calculated the error rate for each of 1 to 500 trees and determined the optimal number of trees based on the lowest error rate. A random forest tree model was built when the above parameters were determined. Finally, the feature importance scores of each core DEG were determined, and genes with an importance value greater than 1 were selected. The intersecting genes of these two machine learning algorithms were considered the signature genes of patients with PAH. The area under the ROC curve (AUC) was used to evaluate the diagnostic efficiency of these signature genes. The pROC package 21 was used to perform the ROC curve analysis.
Construction and verification of a nomogram
The nomogram of signature genes was constructed using the rms package. The predictive power of the nomogram model was assessed using ROC curve analysis.
Immune cell infiltration
CIBERSORT, a method that uses the principle of linear support vector regression, was used to deconvolute the expression matrix of 22 human immune cell subtypes and calculate the proportions of these cell types in 71 samples in patients with PAH and controls.22,23 A score heatmap of the 22 immune cell types was generated according to the scores of each immune cell type in the two groups.
Statistical analysis
All statistical analyses in this study were performed using R software (version 4.1.2). Unless otherwise stated, a value of p < 0.05 was considered statistically significant, and all p values were two-tailed. The flow chart of this study is shown in Figure 1.

Flow chart of this study. GEO, Gene Expression Omnibus; WGCNA, weighted gene co-expression network analysis; PAH, pulmonary arterial hypertension; DEGs, differentially expressed genes; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; ROC, receiver operating characteristic.
Results
Identification of DEGs between PAH patients and controls
A total of 817 DEGs were obtained from the PAH versus control comparison group, including 472 upregulated genes and 346 downregulated genes (Figure 2(a)). A heatmap was constructed to display the top 50 upregulated and top 50 downregulated DEGs between PAH patients and controls (Figure 2(b)).

Identification of the differentially expressed genes (DEGs) in pulmonary arterial hypertension (PAH) patients. (a) A volcano plot shows the expression of DEGs between PAH patients and controls and (b) a heatmap displays the top 50 upregulated and top 50 downregulated DEGs.
Construction of the weighted gene co-expression network
WGCNA was conducted to identify the key modules significantly correlated with PAH. Of the 71 samples in the GSE33463 dataset, 6 outlier samples were excluded (Figure 3(a)). The optimal soft threshold power was determined as 4 with a scale-free index of 0.9 as well as a relatively favorable average connectivity (Figure 3(b)). The cluster dendrogram is shown in Figure 3(c). Finally, the data were clustered into six modules (Figure 3(d)). The association between each module and PAH patients was calculated. The MEdarkred module was significantly associated with PAH patients (correlation = 0.91, p < 0.0001) (Figure 3(e)). The MEdarkred module contained 6259 genes and was considered a key module associated with PAH patients. A Venn diagram was constructed to demonstrate the intersection of DEGs, FAMRGs, and the MEdarkred module. In total, 52 core DEGs were identified (Figure 3(f)).

Weighted gene co-expression network analysis (WGCNA) of the GSE33463 dataset and identification of core differentially expressed genes (DEGs). (a) Sample cluster dendrogram of 41 control samples and 30 pulmonary arterial hypertension (PAH) samples based on their expression profile. (b) The soft threshold power and mean connectivity of WGCNA. (c) Cluster dendrogram of WGCNA. (d) Heatmap of the module-trait correlations. (e) Scatter plot of correlations between genes and traits in the MEdarkred module and (f) Venn diagram showing the intersection of DEGs, fatty acid metabolism-related genes (FAMRGs), and the MEdarkred module.
Functional enrichment analysis
The GO analysis consisted of three categories (Figure 4(a)), specifically biological process (BP), cellular component (CC), and molecular function (MF). The BP analysis showed that fatty acid metabolic process, negative regulation of response to external stimulus, and regulation of cell–cell adhesion were significantly enriched (p < 0.05). In the CC analysis, platelet alpha granule, RNA polymerase II transcription regulator complex, and platelet alpha granule membrane occupied the top three ranks. In addition, fatty acid synthase activity played an important role in the MF category. As shown in the KEGG analysis, the top three enriched pathways of the core DEGs were mainly osteoclast differentiation, hepatitis C, and lipid and atherosclerosis (Figure 4(b)).

Functional enrichment analysis of core differentially expressed genes (DEGs). (a) Functional enrichment in biological process (BP), cellular component (CC), and molecular function (MF) analysis and (b) Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of core DEGs.
Identification of signature genes via the LASSO and random forest algorithms
Two machine learning algorithms were used to identify signature genes among the core DEGs in patients with PAH. The LASSO analysis identified 10 signature genes (Figure 5(a), (b)), while the random forest analysis selected 12 signature genes with a relative importance greater than 1 (Figure 5(c), (d)). Finally, five signature genes were identified by the interaction of these two algorithms, including ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1 (Figure 5(e)).

Machine learning algorithms for signature genes. (a) Penalty plot of the least absolute shrinkage and selection operator (LASSO) model with error bars denoting standard errors. (b) LASSO plot showing that the variations in the size of coefficients for parameters shrank as the k penalty value increased. (c) The error rate confidence intervals for the random forest model. (d) Genes with a relative importance value greater than 1 in the random forest model and (e) Venn diagram showing five signature genes shared by the above two algorithms.
Diagnostic efficacy of signature genes in predicting PAH
Expression of the selected signature genes was significantly different between PAH patients and controls, with ARV1, KCNJ2, PEX11B, and SCO1 being highly expressed in PAH patients and PITPNC1 exhibiting low expression, indicating that these genes may play a potential role in PAH (p values: 3.8e−11 for ARV1, 2.8e−12 for KCNJ2, 1.6e−10 for PEX11B, 1.3e−14 for PITPNC1, and 1.1e−12 for SCO1) (Figures 6(a)–(e)). Moreover, the AUC values of these signature genes were 0.917 for ARV1, 0.934 for KCNJ2, 0.947 for PEX11B, 0.963 for PITPNC1, and 0.940 for SCO1 (Figures 6(f)–(j)). A nomogram based on the five signature genes (ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1) was constructed (Figure 7(a)). ROC curve analysis revealed that the nomogram model based on the five signature genes exhibited better predictive performance than the individual genes (Figure 7(b)). These phenomena suggested that the identified signature genes had remarkable diagnostic efficiency for PAH.

Performance of the signature genes. (a–e) The expression of signature genes in PAH patients and controls and (f–j) ROC curves showing the diagnostic performance of the signature genes. AUC, area under the receiver operating characteristic curve.

Establishment of the nomogram model for pulmonary arterial hypertension (PAH) patients. (a) Nomogram model of PAH and (b) receiver operating characteristic curve showing the diagnostic performance of the five signature genes.
Immune cell infiltration
The CIBERSORT algorithm was used to assess differences in the immune microenvironment. We quantified the levels of 22 immune cell types to evaluate the immune landscape in PAH patients and controls (Figure 8(a)). Compared with controls, PAH patients exhibited greater infiltration of monocytes, M0 macrophages, M2 macrophages, resting mast cells, plasma cells, CD8 T cells, and T regulatory cells (Tregs) and less infiltration of CD4 naive T cells, eosinophils, CD4 memory resting T cells, and activated mast cells (Figure 8(b)). Correlation analysis revealed that ARV1 had a significant positive correlation with Tregs (p < 0.05) and PEX11B had the most significant negative correlation with eosinophils (p < 0.001) (Figure 8(c)).

Immune Infiltration analysis of pulmonary arterial hypertension (PAH) patients and controls using CIBERSORT. (a) The relative abundances of 22 infiltrating immune cells in PAH and control samples. (b) Immune cell infiltration in PAH and control samples and (c) association between signature genes and significant differences in immune cell infiltration. *p < 0.05, **p < 0.01, and ***p < 0.001.
Discussion
PAH is characterized by increased resistance of the pulmonary vasculature system, which eventually leads to increased resting pulmonary artery pressure and ultimately right heart failure, which causes death in most patients with PAH.24,25 Current treatments for PAH do not prevent progression or cure the disease; the 5-year mortality rate of PAH remains as high as 40%. 26 Currently, increasing evidence indicates that fatty acid metabolism is involved in the pathogenesis of PAH by affecting endothelial cell function. The efflux of fatty acid to perivascular cells, such as pulmonary arterial smooth muscle cells and fibroblasts, promotes pulmonary vascular remodeling and the development of pulmonary hypertension. In recent years, fatty acid metabolism has emerged as an important metabolic research direction for PAH. 25
Machine learning algorithms, as a branch of artificial intelligence, have been widely applied in medical research, including disease diagnosis, prognosis, and treatment prediction. In our study, WGCNA, LASSO, and random forest algorithms were employed to identify hub genes for the diagnosis of PAH. Finally, we identified five signature genes related to fatty acid metabolism. A five gene-based model was constructed, and ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1 had improved diagnostic value in identifying PAH patients compared with normal controls and thus had potential as PAH biomarkers.
The ARV1 gene encodes an evolutionarily conserved potential lipid transporter that localizes to the endoplasmic reticulum membrane. There is evidence that ARV1 regulates lipid distribution and metabolism in mammals. 27 Ruggles et al. reported that ARV1, as a modulator of cellular fatty acid levels, plays a crucial role in the progression of many lipotoxic diseases through the modulation of fatty acid metabolism. 28 Liu et al. showed that deletion of ARV1 at the cellular level may affect the immune response via glycosylphosphatidylinositol anchored proteins. 29 Our results revealed that ARV1 expression was significantly higher in PAH patients than in control samples.
The KCNJ2 gene is located on chromosome 17 q23 and encodes the inward rectifier potassium channel Kir2.1. 30 An association between KCNJ2 and tumor progression has been demonstrated, 31 but its expression and role in PAH have not been studied. In this study, KCNJ2 was upregulated in PAH patients, but further studies are needed to understand the possible pathogenic mechanisms.
PEX11B has been reported to mediate growth and division of mammalian peroxisomes, to be involved in the metabolic processes of inflammation-related substances, such as polyunsaturated fatty acids, and to subsequently activate peroxisome proliferator-activated receptor gamma to inhibit pulmonary artery smooth muscle cell proliferation.32,33 Our results showed that PEX11B expression is significantly increased in PAH patients, suggesting that it plays an important role in PAH progression.
SCO1 fulfills essential roles in cytochrome c oxidase (COX) assembly and the regulation of copper homeostasis. 34 SCO1 is required for COX2 metallization, and mutations in the SCO1 gene cause cardiomyopathy, encephalopathy, and hepatopathy. 35 Hlynialuk et al. reported that lipid homeostasis was significantly disrupted in SCO1-deficient mouse livers. 34 Xie et al. showed that SCO1 constitutively interacts with LKB1 to activate AMPK, thereby promoting mitochondrial biogenesis and fatty acid oxidation. 36 However, the role of SCO1 in PAH and the underlying mechanism remain to be investigated.
PITPNC1 functions as a phospholipid transporter and is of importance in modulating β-oxidation of fatty acids in mitochondria. A previous study showed that PITPNC1 expression, upregulated by omental adipocytes, leads to enhanced fatty acid uptake and oxidation. 37 Liang et al. proposed that PITPNC1 promotes radiotherapy resistance in rectal cancer by inhibiting CD8 T cell immune function through regulation of FASN/CD155. In addition, PITPNC1 was not only involved in radioresistance, but also modulated reprogramming of fatty acid metabolism. 38 We therefore speculate that PITPNC1 may play a key role in the pathogenesis of PAH.
Immune cells are involved in the pathogenesis of PAH through energy acquisition. 39 Pulmonary vascular remodeling in PAH patients and PAH animal models is often accompanied by different degrees of perivascular inflammatory infiltration, including T cells, B cells, macrophages, mast cells, and neutrophils. These findings suggest that these immune cells may play an important role in the process of pulmonary vascular remodeling. 40 Our study found that the immune infiltration profiles of PAH patients and normal controls were significantly different. High proportions of monocytes, M0 macrophages, M2 macrophages, resting mast cells, plasma cells, CD8 T cells, and Tregs were found in PAH patients. Tamosiuniene et al. found that Tregs normally limit vascular injury and may protect against the development of PAH. 41 In animal studies, vascular injury caused infiltration of macrophages, mast cells, and B cells into the lungs, similar to human PAH lesions.39,42 Our findings are similar to those of previous studies and provide a foundation for studying the immune mechanisms of PAH.
Our study has several limitations. First, it relies entirely on publicly available databases, and our results were also based on WGCNA and machine learning algorithms. Although obtaining clinical samples from PAH patients is challenging, we are still striving to obtain these samples for experimental validation to determine the true diagnostic effect of the identified biomarkers. Second, we were unable to investigate the regulatory mechanisms of fatty acid metabolism in PAH within a limited time frame.
Conclusions
The present study identified five signature genes, ARV1, KCNJ2, PEX11B, PITPNC1, and SCO1, that showed prominent value in the diagnosis of PAH. We also investigated immune cell infiltration in PAH, which provides a new perspective on the role of immunity in PAH.
Supplemental Material
sj-pdf-1-imr-10.1177_03000605241277740 - Supplemental material for Identification of fatty acid metabolism signature genes in patients with pulmonary arterial hypertension using WGCNA and machine learning
Supplemental material, sj-pdf-1-imr-10.1177_03000605241277740 for Identification of fatty acid metabolism signature genes in patients with pulmonary arterial hypertension using WGCNA and machine learning by Xibang Liu, Dandan Wu, Chunmiao Bao, Zeen Huang, Weiwei Wang, Lili Sun and Lin Qiu in Journal of International Medical Research
Supplemental Material
sj-pdf-2-imr-10.1177_03000605241277740 - Supplemental material for Identification of fatty acid metabolism signature genes in patients with pulmonary arterial hypertension using WGCNA and machine learning
Supplemental material, sj-pdf-2-imr-10.1177_03000605241277740 for Identification of fatty acid metabolism signature genes in patients with pulmonary arterial hypertension using WGCNA and machine learning by Xibang Liu, Dandan Wu, Chunmiao Bao, Zeen Huang, Weiwei Wang, Lili Sun and Lin Qiu in Journal of International Medical Research
Footnotes
Author contributions
LQ conceived and designed the study. XBL and DDW drafted the manuscript and analyzed the data. CMB, ZEH, WWW, and LLS formatted the figures and the manuscript. All authors have read and approved the final published manuscript.
Declaration of conflicting interests
The authors declare that there is no conflict of interest.
Funding
This work was supported by the Key Project of Yuyao Science and Technology (2022YZD01).
Supplementary material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
