Abstract
Background
Lung adenocarcinoma is a multifaceted disease with diverse locations and timings of gene mutations, histology, and molecular pathogenesis. Thus, identifying therapeutic target genes for lung adenocarcinoma has become a major challenge.
Method
We downloaded the gene expression profiles of 220 patients with lung adenocarcinoma from the Gene Expression Omnibus database and identified the differentially expressed genes between noncancer tissue and cancer tissue groups. Mendelian randomization analysis was performed using the exposure gene expression quantitative trait locus dataset and outcome dataset (ieu-a-965) to obtain genome-wide association studies summary data. Sensitivity analysis was used to assess the presence of pleiotropy and heterogeneity in the instrumental variables. Additionally, we performed Mendelian randomization analysis to explore the potential intersecting genes between differentially expressed and specific genes. Moreover, gene set enrichment and overall survival analyses were performed on the intersection gene.
Results
We combined Gene Expression Omnibus and genome-wide association studies data to identify one upregulated and two downregulated genes associated with lung adenocarcinoma risk using inverse variance weight analysis as the primary analytical method. We observed that survival was significantly higher in the groups with high expressions of ANGPT1 and CD36 than in those with low expressions of these genes. POU2AF1 demonstrated inconsistency with the results obtained using Kaplan–Meier analysis and lacked statistical significance in the GSE130779 cohort.
Conclusion
Our results confirmed two specific target genes, CD36 and ANGPT1, based on Mendelian randomization analysis, providing new insights into the role of these target genes in mediating the development of lung adenocarcinoma.
Introduction
Lung cancer, the most common human malignancy, has a rapidly growing incidence rate and is associated with a heavy socioeconomic burden. Therefore, it poses a serious threat to human health. Although the mortality rate of lung cancer has declined, its incidence rates remain high. 1 Non-small cell lung cancer (NSCLC) accounts for 76% of all the histological subtypes of lung cancer. 2 It encompasses a diverse range of cancer types, with lung adenocarcinoma (LUAD) being the most common subtype. LUAD is a multifaceted disease with diverse locations and timing of gene mutations, histology, and molecular pathogenesis. Consequently, identifying therapeutic target genes for LUAD is a major challenge.3–7
Mendelian randomization (MR) is a data analysis technique used for evaluating causal relationships between the exposure factor of interest and the outcome of concern in epidemiological studies by using genetic variants as instrumental variables (IVs). 8 Disease develops as a result of a complex interplay of many factors such as genes, pathogens, environmental factors, and others. When we cannot confirm the effect of exposure on the outcome, we can consider using MR for drawing causal inferences. Genome-wide association studies (GWAS) have been used to identify hundreds of thousands or even millions of genetic variants associated with LUAD risk factors, such as gut microbiota, 9 tuberculosis, 10 cardiovascular disease, 11 and circulating docosahexaenoic acid. 12 These data, regarding genetic variants, form the basis for MR analysis, and are widely used because of their easy availability.13,14 Single-nucleotide polymorphisms (SNPs) are genetic variants formed by change in a single nucleotide position in the genome, including transitions, translocations, deletions, and insertions. These variants from the basis for MR analyses. 15 MR is based on the assumption of the fixed nature of genes that follow Mendel’s first and second laws of inheritance. Therefore, we explored whether there is evidence of a causal relationship between specific signature genes and LUAD using the MR approach.
In this study, we analyzed differentially expressed genes (DEGs) in several Gene Expression Omnibus (GEO) databases and conducted MR analyses using GWAS summary data to investigate the causal effects of specific signature genes on lung cancer. We then identified intersection genes between the DEGs and specific signature genes set. Moreover, we conducted MR analysis to identify specific intersection genes associated with lung cancer. Our findings provide a foundation for more accurate prediction and diagnosis of lung cancer.
Material and methods
Data of patients with LUAD
The gene expression profiles of 220 patients with LUAD from the GEO database (GSE43458, GSE62950, GSE130779, GSE140797, and GSE176348) with complete information on the transcriptomics of overall survival (OS) (https://www.ncbi.nlm.nih.gov/geo/) were included in the validation dataset.
DEG analyses
DEGs between the noncancer tissue (n = 77) and cancer tissue groups (n = 127) from the GEO database (GSE43458, GSE62950, GSE140797, and GSE176348) were analyzed and visualized using the ggplot2 R package and “limma” R package (https://www.bioconductor.org/). A heat map of the DEGs was constructed using the pheatmap R package. The screening criteria were |log2 (fold change) | ≥0.585 and p-value <0.05.
GWAS for outcomes
The GWAS dataset for LUAD was obtained from the ieu-a-965 summary dataset, comprising 3442 cases and 14,894 controls of European ethnicity (8,881,354 SNPs) from the International Lung Cancer Consortium (https://gwas.mrcieu.ac.org/).
IV selection
For the MR analysis, gene expression quantitative trait locus (eQTL) IDs were downloaded from the GWAS summary data (https://gwas.mrcieu.ac.uk/). To ensure data reliability and correct estimates, SNP quality control was performed to obtain compliant IVs. We identified independent SNPs from the multi-trait analysis of GWAS–irritable bowel syndrome (MTAG-IBS) results (p-value <5E-08) and performed linkage disequilibrium (LD) analysis (r2 = 0.001, kb = 10000) using a reference panel of 1000 Genome Project European samples (http://www.internationalgenome.org/). Furthermore, we used the “TwoSampleMR” R package to eliminate SNPs with allelic inconsistencies and palindromic sequences between two samples. The F-statistic was used to evaluate the strengths of the SNP. Weak instruments with an F-statistic <10 were excluded.
MR analyses
To identify signature genes specific to patients with LUAD, we employed the following five approaches: (a) inverse variance weighted (IVW) analysis; (b) MR–Egger regression (c) weighted model (WM) analysis; (d) simple mode analysis; and (e) weighted median (WME) analysis. The IVW method, the most commonly used method for MR analysis, is primarily used to estimate causality. 15 IV-specific causal estimation can be regarded as the slope of the weighted linear regression of IVs in the outcome among the exposure factors, and the intercept term is regarded as zero. When all the selected SNPs are valid IVs, IVW can provide an accurate estimate. 2 Additionally, the MR–Egger regression, simple mode, WM, and WME approaches were used for additional confirmation. p < 0.05 was considered statistically significant.
Sensitivity analyses
Sensitivity analyses are performed to detect potential bias and determine the influence of IVs on outcome variables. Heterogeneity was tested using the main IVW analysis and MR–Egger regression methods. p > 0.05 was considered to demonstrate lack of heterogeneity in the included IVs and indicate that the influence of heterogeneity on the estimation of the causal effect can be ignored. MR–Egger regression analysis can be used to evaluate the bias generated by gene pleiotropy, 16 and the p-value of the regression intercept can be used to evaluate pleiotropy. p < 0.05 indicates the possible existence of pleiotropy, and the IVs are further tested using MR-pleiotropy residual sum and outlier (MR-PRESSO), which can exclude aberrant genetic variants (outliers) and assess the corrected results. 17 Genes with the same odds ratio (OR) direction were selected using the five methods described above.
Intersection genes
We used “VennDiagram” R and performed intersection analysis between the DEGs and LUAD-related genes analyzed using MR. Then, the intersection genes were analyzed using MR analysis again and visualized using “forestploter” R.
Results
Differential gene expression of LUAD
The distribution of these four datasets (GSE43458, GSE62950, GSE140797, and GSE176348) after correction for batch effects was visualized using principal component analysis (Figure 1(a)). After normalizing the results of the datasets, we analyzed the DEGs between the noncancer tissue (n = 77) and cancer tissue (n = 127) groups and screened 378 DEGs (83 upregulated and 285 downregulated) (Figure 1(b)). The LUAD dataset (ieu-a-965) was downloaded from the GWAS database, and we compiled the eQTL data of 19,943 genes to be used as exposure data, which were subjected to MR analysis. Meanwhile, we selected genes related to LUAD based on the following criterion: (a) p <0.05 on IVW analysis and a consistent direction of the ORs obtained using each of the five methods relative to that for the screening method. We then categorized the related genes into MR_or > 1 and MR_or <1 groups according to the ORs. Then, overlapping genes between the DEGs and LUAD-related genes obtained using MR analysis were assessed using Venn diagram analysis.

Intersection analysis between the differentially expressed genes (DEGs) and lung adenocarcinoma (LUAD)-related genes analyzed using Mendelian randomization. (a) Principal component analysis to correct the batch effects in the Gene Expression Omnibus (GEO) database (GSE43458, GSE62950, GSE140797, and GSE176348). (b) Heatmap of DEGs between noncancer tissue (n = 77) and cancer tissue (n = 127) groups from the GEO database (GSE43458, GSE62950, GSE140797, and GSE176348). (c, d) Venn diagram of intersection genes between DEGs and specific signature genes.
Intersection genes of LUAD
The overlapping genes identified using this analysis included one upregulated DEG (POU2AF1) and two downregulated DEGs (ANGPT1, CD36) (Figure 1(c) and (d)). This was observed in our IVW model, in which ANGPT1 (OR: 0.874; 95% confidence interval: 0.789–0.968; p = 0.01) and CD36 (OR: 0.883; 95% confidence interval: 0.789–0.988; p = 0.031) were identified as protective factors, whereas POU2AF1 (OR: 1.534; 95% confidence interval: 1.110–2.119; p = 0.01) was associated with an increased risk of LUAD (Figure 2(a)). The causal estimates were consistent among the five MR models, with Cochran’s Q test indicating no heterogeneity between the IVW model and MR–Egger intercept, demonstrating the lack of directional pleiotropy; this confirmed the reliability of the above causal relationships. Leave-one-out analysis showed that the estimated effects did not depend on specific SNPs (Figure 2(b) and (c)).

Intersection genes were analyzed using Mendelian randomization (MR) analysis. (a) Forest plot of the association of lung adenocarcinoma (LUAD) with intersection genes based on MR analysis. (b) Plots of the effect size of each single-nucleotide polymorphism (SNP) of intersection genes. (c) Leave-one-out plots for the causal effect of intersection genes on LUAD.
Gene set enrichment analysis (GSEA) of intersected genes
To explore the biological functions and pathways of the screened intersection genes, we performed GSEA analysis on the upregulated (POU2AF1) and downregulated (ANGPT1, CD36) genes. Based on GSEA analysis, the following pathways were activated in the high-expression groups of the two genes, ANGPT1 and CD36, and were negatively correlated with the risk of LUAD. GSEA demonstrated that in the high-expression groups of ANGPT1 and CD36, two genes negatively correlated with the risk of LUAD, the enriched pathways included “asthma” and “drug metabolism cytochrome P450,” which are closely related to the regulation of the inflammatory microenvironment in lung cancer. 18 Additionally, pathways such as “intestinal immune for IGA product,” which is significantly dysregulated in lung metastases of colorectal cancer 19 and “retinol metabolism,” “systemic lupus erythematosus,” “complement and coagulation cascades,” “lysosome”, and “PPAR signaling pathway” were also enriched. The signaling pathways enriched in the low-expression group were “base excision repair,” “cell cycle,” “DNA replication,” “pyrimidine metabolism.” and “spliceosome” (Figure 3(a) and (b)). This is related to the DNA repair, proliferation, and cell cycle of lung cancer cells. In addition, the top five signaling pathways enriched in the group with high expression of the gene POU2AF1, which is positively associated with the risk of LUAD, were “cell adhesion molecules cams,” “chemokine signaling pathway,” “cytokine–cytokine receptor interaction,” “natural killer cell mediated cytotoxicity,” and “ primary immunodeficiency.” The above is related to the migration and invasion of lung cancer cells. Activated pathways in the low-expression group were “drug metabolism cytochrome p450,” “fatty acid metabolism,” “O-glycan biosynthesis,” “peroxisome,” and “retinol metabolism” (Figure 3(c)). Figure 3(d) shows a sketch of the locations of these three genes on the chromosome map.

(a–c) Gene set enrichment analysis of intersection genes based on the Kyoto Encyclopedia of Genes and Genomes enrichment pathway in the low-expression and high-expression groups (Samples were grouped based on the expression levels of the target genes. Those with expression levels lower than the average levels were classified as the low-expression group, while those with higher-than-average expression levels were defined as the high-expression group). (d) Location of intersection genes on the chromosome.
Validation of intersected genes
DEGs associated with LUAD were successfully screened, and their associations with LUAD risk were validated using MR analysis. We then performed survival analysis and further validation using the GSE130779 dataset for POU2AF1, ANGPT1, and CD36. Although the results obtained for POU2AF1 were inconsistent with those obtained using the Kaplan–Meier analysis and lacked statistical significance in the GSE130779 cohort (Figure 4(a) and (d)), we observed that the survival of the group with high expressions of ANGPT1 and CD36 was significantly higher than that of the group with low expressions of these genes (ANGPT1: p = 5 × 10−7; CD36: p = 4 × 10−8 (Figure 4(b) and (c)), while the expression was significantly higher in the noncancer tissue group than in the LUAD group in the GSE130779 cohort (Figure 4(d)).

Kaplan–Meier overall survival (OS) curves for LUAD patients based on the expressions of intersection genes. (a–c) Kaplan–Meier OS curves for patients with LUAD intersection genes. (d) Gene expressions of intersection genes in the GSE130779 database. ap < 0.05, bp < 0.01, and cp < 0.001.
Discussion
LUAD is the most common subtype of NSCLC. Despite the significant progress in clinical and experimental research, the prognosis of patients with LUAD remains unclear. Therefore, it is imperative to identify new diagnostic and therapeutic biomarkers for LUAD. Herein, we performed differential gene analysis using GEO data, along with cross-comparison using Mendelian analysis of eQTL genes to screen for genes positively associated with LUAD risk, and identified two important genes with favorable prognostic and cancer-suppressing functions.
Evidence suggests that CD36 and ANGPT1 are strongly associated with lung cancer risk. CD36 is a scavenger receptor class B type 2 (SR-B2), a transmembrane glycoprotein that is expressed on the cell surface in a variety of cell types, including dendritic cells, microvascular endothelial cells (MVECs), platelets, monocytes/macrophages, and adipocytes. 20 Therefore, CD36 is expressed in tumor tissues not only by tumor cells but also by stroma, immune cells, and MVECs, and its expression depends on the tumor stage and cell type. Low CD36 expression in certain primary tumors (e.g. colon, breast, pancreatic, and ovarian cancers) is associated with higher metastatic grade and poor prognosis. 21 CD36 may also attenuate angiogenesis by binding to thrombospondin-1 (TSP-1), thereby inducing apoptosis in tumor MVECs or blocking the vascular endothelial growth factor receptor 2 pathway. 22 Several studies have shown that CD36 plays a minor role in primary tumorigenesis but is important in the initiation of the metastatic process. CD36 expression is typically downregulated in the tumor microvasculature, which promotes tumor progression and metastasis. Moreover, in the tumor mesenchyme, the expression of CD36 is defective, and lower levels of CD36 expression cause more aggressive tumors.23–25 Sun et al. have demonstrated that CD36 exhibits low expression and high methylation in lung cancer; additionally, it inhibits the proliferation of lung cancer cells, promotes apoptosis, blocks the cell cycle in the G0/G1 phase, and inhibits cell migration. In vivo experiments have shown that a combination of decitabine (DCTB) and cedarbenazine (CDM) induces demethylation and re-expression of silenced CD36, which inhibits tumor growth. 26
Angiopoietins (ANGPTs) are ligands for the tyrosine kinase receptor (TIE2), and there are two types of ANGPTs: ANGPT1 and ANGPT2.27–29 ANGPT1 is produced by perivascular mural cells as an agonist that stabilizes the integrity of endothelial cell junctions and inhibits the vascular endothelial growth factor (VEGF)-A-induced increase in vascular permeability. 30 ANGPT–TIE signaling is a key regulator of vascular maturation, controlling vascular quiescence, maintenance, and homeostasis. 31 ANGPT1 promotes angiogenesis in normal and developing tissues, whereas it inhibits angiogenesis, tumor growth, and metastasis in cancerous tissues. 32 Additionally, ANGPT1 has been reported as a potential marker for predicting postoperative survival and recurrence in patients with NSCLC. 33 A recent study has demonstrated that alterations in the intronic region of ANGPT1 are detected in cancers and influence ANGPT1 expression levels, which leads to tumor progression, and that high ANGPT1 expression is associated with higher survival in patients with lung cancer,34,35 suggesting that ANGPT1 is a novel lung tumor–suppressor gene. These results were consistent with those of the differential expression and MR analyses based on the GEO database, which indirectly verified the reliability of our analysis.
Although no studies have elucidated the molecular mechanisms involved in the development, metastasis, and prognosis of LUAD, POU2AF1 may play an important role in other cancers such as breast cancer, 36 renal cell carcinoma, 37 melanoma, 38 lymphomas, and leukemias. 39 POU2AF1 is a prognostic marker or key molecule in the tumorigenesis of these cancers; therefore, it may also be a marker for LUAD. Although our predictive analysis of the current GWAS and gene eQTL data using MR analysis showed that POU2AF1 overexpression may be associated with an increased risk of LUAD, this association is inconsistent with results obtained using the Kaplan–Meier analysis. The discrepancy between the results of the Kaplan–Meier survival analysis and our MR analysis may be attributed to several factors. First, the GWAS data used in our analyses were derived from European populations, limiting the generalizability of the findings. Expanding the study to include diverse population groups would enhance the robustness and applicability of the results. Second, although the F-statistics of the genetic variants confirmed the validity of the IVs, the statistical power of some findings was relatively limited. This limitation could be attributed to a relatively small sample size or the inherent complexity of genetic variations. Future access to larger GWAS datasets is anticipated to provide more extensive resources, enabling further validation and refinement of our current results. Additionally, in Kaplan–Meier survival analysis, the completeness of data and adequacy of sample size are crucial for ensuring accuracy. Missing data or an insufficient sample size could compromise the reliability of survival analysis outcomes. Collectively, these factors may have contributed to the inconsistencies observed between the Kaplan–Meier and MR analyses. Finally, research on POU2AF1 in LUAD remains limited. Further in-depth experimental studies are required to confirm and elucidate its role and potential effects in the context of LUAD.
CD36 and ANGPT1 have been extensively studied in relation to the basic mechanism of lung cancer. However, such findings are rarely obtained using MR analysis. For example, some studies have found a significant genetic correlation between asthma and eosinophil count, with MR analysis further supporting a key role of the CD36 gene. 40 However, these associations are rarely discussed in the clinical context of lung cancer and warrant further investigation. Although our current study focused on identifying gene–disease associations, we are aware that the application of these findings to therapeutic strategies or prognostic markers is crucial for translating our results into clinical practice. In future work, we aim to explore the potential role of these genes in personalized treatment approaches and prognostic assessment, which could provide new avenues for improving outcomes in LUAD patients. We used the MR framework to assess causal associations between a range of DEGs and LUAD; by utilizing genetic variants as IVs, the MR design could largely reduce residual confounding. In addition, the use of complementary methods for multiplicity and sensitivity analyses allowed us to rigorously assess violations of the MR assumptions. By incorporating related traits from different genes into the MR analysis, we were able to include genetic variants associated with only a single phenotype to limit the bias from potential common genetic effects between phenotypes. However, this study has certain limitations. First, all GWAS data were obtained from European populations, and the consistency of the associations revealed in this study needs to be verified in other populations. Second, although the F-statistics for genetic variation suggested that weak instruments were not used, the statistical power of some results was modest. This modest power may be attributed to the relatively small sample size or genetic variations, and therefore, the possible presence of “false negatives” cannot be completely ruled out. Therefore, it is expected that larger GWAS datasets will be used to further validate these results. However, based on a range of quality-controlled statistical methods, the likelihood of a pleiotropic bias was low in this study.
Conclusions
Our results identified three key target genes, CD36, POU2AF1, and ANGPT1, that are significantly associated with LUAD risk and survival outcomes using MR and DEG analyses. These findings suggest that CD36 and ANGPT1 are promising biomarkers for prognosis in LUAD patients. Additionally, these genes may provide new therapeutic targets for personalized treatment strategies, particularly in the context of early detection and treatment-response monitoring. Our results contribute to the growing body of knowledge on the molecular mechanisms of LUAD and offer potential avenues for future clinical and experimental research.
Footnotes
Acknowledgments
The authors would like to thank Mr. Tao Peng for his technical support.
Authors’ contributions
LRY, LC, and JF: Conceptualization and Writing—review & editing
XHP, ZWT: Data curation and Funding acquisition
YL: Formal analysis, Validation, and Software
LRY, TTL: Methodology, Resources, and Project administration
LRY and LTT: Writing—original draft
Data availability statement
The datasets generated in the current study are available in the GWAS summary data (https://gwas.mrcieu.ac.uk/) and the GEO repository (https://www.ncbi.nlm.nih.gov/geo/ and ![]()
Declaration of conflicting interests
The authors have no relevant financial or nonfinancial interests to disclose.
Funding
This work was supported by the Kunming University of Science and Technology and the First People’s Hospital of Yunnan Province Joint Special Project on Medical Research (KUST-KH202200 (Grant number 2Z)), Joint Special Fund of Applied Fundamental Research of Kunming Medical University granted by the Science and Technology Office of Yunnan (Grant number (202301AY070001-299)), Yunnan Province Orthopedics and Sports Rehabilitation Clinical Medical Research Center Open Project (Grant number (2022YJZX-GK04)), and Yunnan Province Young and Middle-Aged Academic and Technical Leaders Reserve Talent Project (Grant number (202105AC160004)). The funding agencies had no role in the study design, study implementation, manuscript writing, or study publication.
