Identification of COVID-19 susceptibility genes by Mendelian randomization

Abstract

Objective

We applied Mendelian Randomization (MR) to investigate potential genetic targets involved in the pathogenesis of COVID-19, aiming to identify causal factors that may contribute to disease susceptibility and severity.

Methods

We aggregated multi-omics quantitative trait loci data and explored the pathogenic targets of COVID-19 using summary data-based Mendelian randomization, colocalization, and MR.

Results

We identified Latent transforming growth factor beta binding protein 2 (LTBP2) and α-1,3-N-acetylgalactosaminyltransferase (ABO) as primary targets for the risk of severe COVID-19. Elevated levels of LTBP2 (odds ratio (OR): 0.53, 95% CI: 0.39–0.70) were associated with reduced risk of severe COVID-19, while ABO (OR: 1.09, 95% CI: 1.06–1.11) were associated with increased risk of severe COVID-19. These effects were consistent with the expression of LTBP2 (OR: 0.58, 95% CI: 0.45–0.75) and the methylation of ABO (cg07241568: OR: 1.20, 95% CI: 1.14–1.26; cg24267699: OR: 1.11, 95% CI: 1.08–1.15) regarding the risk of severe COVID-19.

Conclusions

We have identified several potential genes related to COVID-19 risks, providing a basis for understanding the pathogenesis of COVID-19 and the development of candidate therapeutic drugs.

Keywords

Mendelian randomization COVID-19 gene expression methylation protein

Introduction

The 2019 coronavirus disease (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).¹ As of February 2024, there have been over 770 million confirmed cases of COVID-19 and over 7 million deaths (https://covid19.who.int). While government quarantine measures and vaccine development have reduced the incidence and mortality of COVID-19,¹ the widespread global variation of COVID-19 strains still results in significant heterogeneity among infected individuals in terms of symptoms, severity, and prognosis. Furthermore, the covert transmission by asymptomatic carriers continues to pose challenges for the prevention and treatment of COVID-19.²

Host genetics determine the susceptibility, severity, and prognosis of COVID-19, and the COVID-19 Host Genetics Initiative has identified several risk loci associated with the disease.³ Genome-wide association studies (GWASs) have provided abundant genetic information on COVID-19,⁴ aiding in a deeper understanding of the genetic mechanisms underlying COVID-19 infection and progression. However, due to limitations in recruiting study participants and the lack of extensive clinical metadata, a comprehensive genotype-phenotype interpretation of the research findings is still not feasible.⁵ Furthermore, GWAS cannot infer causal relationships, highlighting the urgent need to explore risk loci associated with COVID-19 infection and progression.

Mendelian randomization (MR) employs genetic variants as instrumental variables to infer causal relationships between exposure and outcome. MR is less susceptible to reverse causation and unknown confounding because genetic variants are randomly assigned at conception and are not influenced by the onset of disease. Therefore, MR has significant advantages in inferring causal effects, which may be difficult to achieve with traditional experimental techniques.⁶ Studies using large-scale GWAS and molecular quantitative trait loci (QTL) data can explore the potential associations between gene expression and COVID-19 risk from the perspectives of expression. Previous studies have only explored the driving factors of COVID-19 through expression quantitative trait loci (eQTLs). However, in the context of rich genetics, the disease process of COVID-19 is heterogeneous and exhibits widespread environmental and temporal variability. Therefore, multi-omics analysis can help develop new and innovative effective drugs. This can provide appropriate targets for the prevention and treatment of COVID-19. Therefore, we employed MR to investigate the potential associations between gene expression, methylation, and protein abundance with the risk of COVID-19.

Method

Study design

Figure 1 illustrates the current research workflow. The current MR analysis utilized publicly available datasets including the COVID-19 Host Genetics Initiative (Release 7, https://www.covid19hg.org/),³ FinnGen study, and other large-scale GWASs. The COVID-19 Host Genetics Initiative comprised GWAS data for very severe respiratory confirmed covid, hospitalized COVID-19, and population COVID-19. The GWAS data for COVID-19 confirmed population were obtained from the r10 release of FinnGen study. We employed the summary data-based Mendelian randomization (SMR) method to assess the relationship between gene expression, methylation, and protein abundance with the risk of COVID-19. The dataset from the COVID-19 Host Genetics Initiative served as the primary discovery dataset, with the FinnGen study dataset serving as a validation dataset to confirm our findings. Finally, the final candidate genes were determined by integrating the MR results from different omics. There was no overlap in samples between the exposure and outcome populations, and all were from European cohorts. The research process was based on the three major assumptions of MR: (1) a significant association between genetic variation and exposure (the correlation hypothesis); (2) genetic variation is unrelated to confounding factors (exclusivity hypothesis); and (3) genetic variation influences outcomes only through exposure (independence hypothesis). This study is reported according to the STROBE-MR statement (Supplementary STROBE-MR checklist table).

Figure 1.

Flowchart of the study design. QTL: quantitative trait loci. eQTL: expression quantitative trait loci; mQTL: methylation quantitative trait loci; pQTL: protein quantitative trait loci; SMR: summary data-based Mendelian randomization; MR: Mendelian randomization; FDR: false discovery rate; PPH4: posterior probability of H4.

Sample and data

We used the integration of multi-omics evidence to elucidate the potential molecular networks underlying COVID-19 occurrence. QTLs can reveal associations between single-nucleotide polymorphisms (SNPs) and gene expression, methylation, and protein abundance. The eQTL dataset with blood gene expression within 1000 kb of the coding sequences was obtained from the eQTLGen Consortium, comprising 31,684 individuals.⁷ The blood methylation quantitative trait loci (mQTL) dataset was extracted from the study by McRae et al., included 1980 individuals.⁸ The dataset of protein quantitative trait loci (pQTL) was obtained from the UK Biobank - Pharma Proteomics Project (UKB-PPP), including plasma proteomic profiles of 54,219 UKB participants.⁹ We employed the default genome-wide significance p-value threshold of 5 × 10⁻⁸ to select the most relevant eQTLs, mQTLs, and pQTLs as instrumental variables for the SMR analysis.

The summary GWAS data for COVID-19 outcomes were sourced from the COVID-19 Host Genetics Initiative and the FinnGen Study, with participants of European ancestry. The Release 7 of the COVID-19 Host Genetics Initiative includes very severe respiratory confirmed covid (severe COVID-19), hospitalized COVID-19, and population COVID-19. Severe COVID-19 included 13,769 cases and 1,072,442 controls; hospitalized COVID-19 included 32,519 cases and 2,062,805 controls; population COVID-19 included 122,616 cases and 2,475,240 controls. Since the FinnGen study was unable to access the summary data for very severe respiratory diseases, hospitalized COVID-19, and confirmed COVID-19, the comfirmed COVID-19 GWAS was used as the validation cohort, consisting of 2856 cases and 405,232 controls.

Measures of variables

COVID-19 severity was defined using three phenotypes provided by the COVID-19 Host Genetics Initiative: (1) very severe respiratory confirmed COVID-19 vs population, (2) hospitalized COVID-19 vs non-hospitalized, and (3) COVID-19 vs population (Table S1).

Data analysis procedure

SMR analysis (SMR v 1.3.1) was employed to assess the relationship between eQTLs, mQTLs, and pQTLs and the risk of COVID-19 (https://yanglab.westlake.edu.cn/software/smr/#Overview).¹⁰ All analyses were conducted using the default settings of SMR. The odds ratio (OR) estimates for the association of QTLs with COVID-19 risk were calculated based on β coefficients. The effect size (β) of the variants reflects the direction of change in gene expression, methylation, and protein expression. To control for genome-wide type I errors, we adjusted p-values using the false discovery rate (FDR). The heterogeneity in dependent instruments (HEIDI) test was employed to distinguish pleiotropy from linkage disequilibrium (LD), where p-HEIDI < 0.01 was considered indicative of potential pleiotropy and thus excluded from the analysis.

For significant MR results, we investigated the shared causal variants between eQTLs, mQTLs, and pQTLs with COVID-19. Based on published studies, we selected a window of 100 kb for colocalization analysis. The posterior probability of hypothesis 4 (PPH4) was used as evidence of colocalization between QTLs and GWAS. For protein abundance, PPH4 ≥ 0.7 represents strong evidence of colocalization, while PP.H4 between 0.5 and 0.7 indicates moderate evidence of colocalization.^11,12

Since proteins were the ultimate products of gene expression, further validation was needed to confirm the significant proteins identified in the SMR analysis. We used proteins from the UKB-PPP as exposures and employed traditional two-sample MR to validate the significance of these proteins. In the UKB-PPP, SNPs significantly associated with proteins (p < 5 × 10⁻⁸) were extracted, and LD r² < 0.1 was used to determine the extent of LD between SNPs, with a clustering distance set at 100 kb. Additionally, SNPs and proteins within the major histocompatibility complex region (chr 6: 25.5–34.0 Mb) were excluded due to the complex LD structure.¹³ Subsequently, we excluded weak instrumental variables based on the F-statistic (F < 10) and filtered out proteins exhibiting horizontal pleiotropy. The inverse variance weighted method was employed as the primary outcome of the MR analysis, followed by FDR correction. Therefore, proteins were included in subsequent multi-omics analyses only if they reached significance in both traditional two-sample MR and SMR. MR was performed using R software windows version 4.3.1

Omics evidence integration

Since significant proteins have already been identified based on SMR and two-sample MR, and the regulatory effect of proteins on phenotypes is significantly higher than that of gene expression or methylation, the integration of all multi-omics evidence needs to be based on these significant proteins. In other words, candidate genes need to have a causal association with COVID-19 at the protein level. Based on this principle, we categorized as follows: (1) primary targets were defined as proteins significantly associated with COVID-19 at the level of protein abundance (FDR-p value < 0.05), with a PPH4 > 0.7, and also significantly associated with COVID-19 at the level of methylation or gene expression (FDR-p value < 0.05); (2) secondary targets were defined as proteins significantly associated with COVID-19 (FDR-p value < 0.05) and with a PPH4 > 0.7, or proteins significantly associated with COVID-19 (FDR-p value < 0.05) with a PPH4 ranging from 0.5 to 0.7, and also significantly associated with COVID-19 at the level of methylation or gene expression (FDR-p value < 0.05). (3) Potential targets were defined as proteins significantly associated with COVID-19 at the level of protein abundance (FDR-p value < 0.05) with a PPH4 ranging from 0.5 to 0.7. The final candidate genes for COVID-19 were identified in the omics analysis.

Results

Gene expression and COVID-19

The results of the causal relationship between gene expression and COVID-19 outcomes can be found in Tables S2–S4. After FDR, HEIDI, and colocalization (PPH4 > 0.7), a total of 10, 15, and 10 genes were associated with severe COVID-19, hospitalized COVID-19, and population COVID-19 risk.

For severe COVID-19, ATP11A, latent transforming growth factor beta binding protein 2 (LTBP2), TNFAIP8L1, and NUCB1-AS1 were validated in the FinnGen study (Table S14). For hospitalized COVID-19, PPT2, STARD10, ZNF268, ATP11A, SLC22A31, TNFAIP8L1, and NUCB1-AS1 were validated in the FinnGen study (Table S15). For population COVID-19, NFKBIZ and NUCB1-AS1 were validated in the FinnGen study (Table S16). The specific gene effects are shown in Table 1.

Table 1.

The effect of gene expression on COVID-19 risk was validated by FinnGen.

Disease	Gene	OR(95%CI)	FDR_p value	PPH4
Severe COVID-19	ATP11A	0.29 (0.16–0.52)	0.009	0.99
	LTBP2	0.58 (0.45–0.75)	0.011	0.31
	TNFAIP8L1	2.77 (1.85–4.15)	<0.001	<0.01
	NUCB1-AS1	0.81 (0.73–0.9)	0.022	<0.01
Hospitalized COVID-19
	PPT2	1.27 (1.12–1.43)	0.042	0.14
	STARD10	1.06 (1.03–1.09)	0.045	0.89
	ZNF268	1.25 (1.11–1.41)	0.044	0.77
	ATP11A	0.45 (0.33–0.64)	0.002	0.99
	SLC22A31	1.09 (1.06–1.13)	<0.001	<0.01
	TNFAIP8L1	1.69 (1.31–2.19)	0.019	<0.01
	NUCB1-AS1	0.88 (0.82–0.94)	0.042	<0.01
Population COVID-19
	NFKBIZ	1.14 (1.09–1.19)	<0.001	<0.001
	NUCB1-AS1	0.89 (0.85–0.92)	<0.001	0.11

OR: odds ratio; CI: confidence interval; PPH4: posterior probability of H4; FDR: false discovery rate; LTBP2: latent transforming growth factor beta binding protein 2.

Gene methylation and COVID-19

The results of gene methylation on the causal relationship with COVID-19 outcomes can be found in Table S5–S7. After FDR, HEIDI, and colocalization (PPH4 > 0.7), 21 genes and 30 CpG sites were identified for severe COVID-19, 33 genes and 67 CpG sites for hospitalized COVID-19, and 22 genes and 45 CpG sites for population COVID-19. In the FinnGen validation cohort, a total of 8 genes and 16 CpG sites were successfully validated for severe COVID-19, 7 genes and 15 CpG sites for hospitalized COVID-19, and 10 genes and 14 CpG sites for population COVID-19 (Table S17–19). The specific methylation site effect effects are shown in Table 2. For example, increased methylation at the cg02098565 site of the ACSF3 gene was associated with an increased risk of severe COVID-19 (OR: 1.5, 95% CI: 1.25–1.81), while increased methylation at the cg04308346 site was associated with a decreased risk of severe COVID-19 (OR: 0.68, 95% CI: 0.58–0.79) (Table 2). The same trend persisted in hospitalized COVID-19 and population COVID-19.

Table 2.

The association of gene methylation with the risk of COVID-19 was validated by FinnGen.

Disease	Probe ID	Gene	OR (95%CI)	FDR_p value	PPH4
SevereCOVID-19	cg01771416	NFKBIZ	1.33 (1.15–1.54)	0.042	0.94
	cg03837452	ZBTB11	0.91 (0.87–0.95)	0.016	0.77
	cg10353683	XCR1	3.34 (2.32–4.81)	<0.01	<0.01
	cg03031988	BAT1	0.88 (0.84–0.93)	0.008	<0.01
	cg09878914	BAT1	1.37 (1.19–1.59)	0.01	<0.01
	cg14130272	BAT1	0.88 (0.84–0.93)	0.007	<0.01
	cg21079591	BAT1	0.89 (0.85–0.94)	0.007	<0.01
	cg07241568	ABO	1.2 (1.14–1.26)	<0.01	0.99
	cg24267699	ABO	1.11 (1.08–1.15)	<0.01	0.87
	cg01751047	CDH15	0.9 (0.87–0.93)	<0.01	<0.01
	cg02098565	ACSF3	1.5 (1.25–1.81)	0.009	0.01
	cg02193283	ACSF3	1.31 (1.19–1.45)	<0.01	0.01
	cg04308346	ACSF3	0.68 (0.58–0.79)	0.001	0.03
	cg05448332	CDH15	1.29 (1.17–1.42)	0.001	<0.01
	cg10453148	ACSF3	1.15 (1.09–1.22)	0.001	<0.01
	cg26752725	TULP2	1.19 (1.09–1.3)	0.043	0.83
Hospitalized COVID-19	cg01771416	NFKBIZ	1.31 (1.18–1.47)	0.001	0.96
	cg03837452	ZBTB11	0.91 (0.89–0.94)	<0.01	0.55
	cg05405742	ZBTB11	0.75 (0.65–0.86)	0.025	0.69
	cg05890243	ZBTB11	0.77 (0.68–0.86)	0.004	0.82
	cg10353683	XCR1	2.16 (1.71–2.74)	<0.01	<0.01
	cg21079591	BAT1	0.93 (0.9–0.96)	0.009	0.04
	cg07241568	ABO	1.18 (1.14–1.23)	<0.01	0.91
	cg24267699	ABO	1.11 (1.09–1.14)	<0.01	0.88
	cg01751047	CDH15	0.92 (0.9–0.95)	<0.01	0.48
	cg02193283	ACSF3	1.16 (1.09–1.24)	0.002	<0.01
	cg04308346	ACSF3	0.8 (0.72–0.88)	0.005	0.01
	cg05448332	CDH15	1.21 (1.13–1.3)	<0.01	0.47
	cg06170283	CDH15	1.19 (1.1–1.28)	0.007	0.19
	cg08618878	CDH15	1.28 (1.14–1.45)	0.023	0.79
	cg10453148	ACSF3	1.09 (1.05–1.13)	0.01	<0.01
Population COVID-19	cg03837452	ZBTB11	0.93 (0.92–0.95)	<0.01	0.2
	cg05405742	ZBTB11	0.8 (0.73–0.87)	0.001	0.67
	cg05890243	ZBTB11	0.79 (0.72–0.87)	0.001	0.83
	cg10353683	XCR1	1.21 (1.13–1.29)	<0.01	<0.01
	cg07241568	ABO	1.16 (1.13–1.19)	<0.01	0.88
	cg00062776	TULP2	0.9 (0.87–0.93)	<0.01	0.09
	cg01656853	FUT2	0.92 (0.89–0.95)	0.002	0.03
	cg06705122	PLEKHA4	1.12 (1.06–1.18)	0.047	0.95
	cg15146515	IZUMO1	0.93 (0.9–0.96)	0.006	<0.01
	cg15673069	RASIP1	1.1 (1.05–1.15)	0.023	<0.01
	cg15793258	TNFAIP8L1	0.98 (0.97–0.99)	0.023	<0.01
	cg15974053	HSD17B14	0.94 (0.92–0.96)	<0.01	<0.01
	cg16530498	HSD17B14	0.94 (0.92–0.96)	<0.01	<0.01
	cg26752725	TULP2	1.1 (1.07–1.14)	<0.01	0.76

OR: odds ratio; CI: confidence interval; PPH4: posterior probability of H4; FDR: false discovery rate; ABO: α-1,3-N-acetylgalactosaminyltransferase.

Protein abundance and COVID-19

The protein abundance from the two-sample MR analysis is depicted in Figure S1 A-C. Following the identification of significant proteins through SMR, the results from SMR and the two-sample MR analysis were combined. While these proteins exhibited significance in both SMR and the two-sample MR analysis (Table S8–S13), there was a discrepancy in the effect direction of ACE2 between severe COVID-19 (OR: 1.29 vs. 0.82) and hospitalized COVID-19 (OR: 1.26 vs. 0.92) (Table 3); therefore, ACE2 was excluded from subsequent analyses. Eight, eight, and three proteins were associated with the risk of very severe respiratory disease, hospitalized COVID-19, and confirmed COVID-19, respectively (Tables S8–S10). We used colocalization to filter proteins. After colocalization filtering (PPH4 > 0.5), higher levels of SFTPD (OR: 0.91, 95% CI: 0.88–0.95) and LTBP2 (OR: 0.53, 95% CI: 0.39–0.70) were negatively associated with the risk of severe COVID-19, while elevated levels of GART (OR: 1.60, 95% CI: 1.23–2.07) and α-1,3-N-acetylgalactosaminyltransferase (ABO) (OR: 1.09, 95% CI: 1.06–1.15) were positively associated with the risk of severe COVID-19. Elevated levels of BTN1A1 (OR: 0.82, 95% CI: 0.73–0.91) and SFTPD (OR: 0.94, 95% CI: 0.91–0.96) were negatively associated with the risk of hospitalized COVID-19, while increased levels of RNF43 (OR: 1.38, 95% CI: 1.16–1.64) were positively associated with the risk of hospitalized COVID-19. Higher levels of leucine-rich repeats and calponin homology domain-containing 4 (LRCH4) (OR: 0.71, 95% CI: 0.60–0.83) were negatively associated with the risk of population COVID-19 (Table 4). Additionally, the direction of effect for these significant proteins remained consistent in FinnGen study (Table S20–S22).

Table 3.

Causal association of protein richness with COVID-19 risk.

		SMR		Two-sample MR
Disease	Gene	OR (95%CI)	FDR_p value	OR (95%CI)	FDR_p value
Severe COVID-19	IL10RB	1.18 (1.11–1.25)	<0.001	1.15 (1.08–1.21)	<0.001
	ICAM5	0.91 (0.88–0.95)	0.003	0.84 (0.81–0.86)	<0.001
	GART	1.6 (1.23–2.07)	0.042	1.57 (1.29–1.92)	<0.001
	ABO	1.09 (1.06–1.11)	<0.001	1.09 (1.07–1.11)	<0.001
	KLK1	0.91 (0.88–0.93)	<0.001	0.92 (0.9–0.94)	<0.001
	LRRC37A2	0.89 (0.87–0.92)	<0.001	0.92 (0.91–0.93)	<0.001
	SFTPD	0.91 (0.88–0.95)	<0.001	0.89 (0.87–0.92)	<0.001
	LTBP2	0.53 (0.39–0.7)	0.002	0.75 (0.68–0.82)	<0.001
	ACE2	1.29 (1.15–1.44)	0.002	0.82 (0.73–0.92)	0.006
Hospitalized COVID-19	BTN1A1	0.82 (0.73–0.91)	0.032	0.83 (0.81–0.86)	<0.001
	IL10RB	1.09 (1.05–1.14)	0.004	1.09 (1.05–1.14)	<0.001
	RNF43	1.38 (1.16–1.64)	0.026	1.23 (1.16–1.32)	<0.001
	EFNA1	1.14 (1.08–1.21)	0.003	1.18 (1.12–1.24)	<0.001
	ABO	1.09 (1.07–1.11)	<0.001	1.09 (1.08–1.1)	<0.001
	KLK1	0.94 (0.92–0.96)	<0.001	0.95 (0.93–0.96)	<0.001
	LRRC37A2	0.92 (0.9–0.94)	<0.001	0.94 (0.94–0.95)	<0.001
	SFTPD	0.94 (0.91–0.96)	<0.001	0.91 (0.9–0.93)	<0.001
	ACE2	1.26 (1.14–1.39)	<0.001	0.92 (0.86–0.98)	0.047
Population COVID-19	EFNA1	1.11 (1.07–1.14)	<0.001	1.11 (1.09–1.13)	<0.001
	ABO	1.08 (1.07–1.09)	<0.001	1.08 (1.08–1.09)	<0.001
	LRCH4	0.71 (0.6–0.83)	0.009	0.92 (0.87–0.98)	0.040

MR: Mendelian randomization; SMR: summary data-based Mendelian randomization; OR: odds ratio; CI: confidence interval; FDR: false discovery rate; ABO: α-1,3-N-acetylgalactosaminyltransferase; LRCH4: leucine-rich repeats and calponin homology domain-containing 4.

Table 4.

Genetic prediction of association of genetically encoded proteins with the risk of COVID-19.

	Target	OR (95%CI%)	FDR_p value	PPH4
Severe COVID-19	IL10RB	1.18 (1.11–1.25)	<0.001	<0.01
	ICAM5	0.91 (0.88–0.95)	0.003	<0.001
	GART	1.60 (1.23–2.07)	0.042	0.68
	ABO	1.09 (1.06–1.11)	<0.001	0.54
	KLK1	0.91 (0.88–0.93)	<0.001	0.01
	LRRC37A2	0.89 (0.89–0.92)	<0.001	0.12
	SFTPD	0.91 (0.88–0.95)	<0.001	0.95
	LTBP2	0.53 (0.39–0.70)	0.002	0.98
Hospitalized COVID	BTN1A1	0.82 (0.73–0.91)	0.032	0.81
	IL10RB	1.09 (1.05–1.14)	0.004	<0.01
	RNF43	1.38 (1.16–1.64)	0.026	0.83
	EFNA1	1.14 (1.08–1.21)	0.003	<0.01
	ABO	1.09 (1.07–1.11)	<0.001	0.14
	KLK1	0.94 (0.91–0.96)	<0.001	0.01
	LRRC37A2	0.92 (0.90–0.94)	<0.001	<0.01
	SFTPD	0.94 (0.91–0.96)	<0.001	0.66
COVID vs population	EFNA1	1.11 (1.07–1.14)	<0.001	<0.01
	ABO	1.08 (1.07–1.09)	<0.001	0.02
	LRCH4	0.71 (0.60–0.83)	0.009	0.91

OR: odds ratio; CI: confidence interval; FDR: false discovery rate; PPH4: posterior probability of H4; LTBP2: latent transforming growth factor beta binding protein 2; ABO: α-1,3-N-acetylgalactosaminyltransferase; LRCH4: leucine-rich repeats and calponin homology domain-containing 4.

Integration of omics evidence

In all three COVID-19 outcomes, we identified 7 genes, 32 methylation sites, and 1 protein (Figure S1 D-F). These targets were significant across the three COVID-19 outcomes and exhibited consistent risk effects (Figure S1 G), indicating the stability of our findings and suggesting that common targets have similar effects on the risk of COVID-19. After integrating the multi-omics evidence and validating through FinnGen, we ultimately identified LTBP2 as a primary target for severe COVID-19, with overlapping effects observed in eQTLs and pQTLs. ABO was identified as a secondary target for severe COVID-19, with overlapping effects observed in mQTLs and pQTLs. For targets without overlapping effects, we identified SFTPD as a secondary target for severe COVID-19 and GART as a potential target for severe COVID-19. For hospitalized COVID-19, overlapping omics evidence was not found, but SFTPD was identified as a potential target for hospitalized COVID-19. For population COVID-19, overlapping omics evidence was not found, while LRCH4 was identified as a secondary target for population COVID-19. Additionally, for the omics evidence of severe COVID-19, we further examined the positive correlation between the expression of LTBP2 and ABO and their respective protein levels (Table S23), which is consistent with the risk effects of LTBP2 and ABO on severe COVID-19 (Table S7). Similarly, these associations were validated in FinnGen (Table S24), although not all associations reached statistical significance, the direction of all associations remained consistent, consistent with our observations across omics (Table S1, S4, S7).

Discussion

In this study, we used SMR combined with MR to explore the genetic drivers of COVID-19 for the first time. We found that LTBP2, ABO, SFTPD, and GART were associated with the risk of severe COVID-19. SFTPD was associated with the risk of hospitalized COVID-19, while LRCH4 was associated with the risk of population COVID-19. This provides evidence for potential mechanisms linking gene expression, methylation, protein abundance, and COVID-19. We propose suggestions to improve the problem-oriented drug discovery process, emphasizing best practices for the management of innovative drug development.

The spread and origin of COVID-19 are related to multiple factors, and traditional epidemiological models are inadequate in addressing viral mutations or changes in national containment policies.¹⁴ Unlike previous studies, we incorporated pQTLs instead of solely eQTLs.^15–18 Proteins are the molecules that directly perform biological functions, so pQTLs are closer to the ultimate biological effects of diseases. Moreover, our results differ from the targets obtained by Wu et al. using the cross-methylome omnibus test.¹⁸ There is environmental variability in COVID-19.¹⁹ In addition, we have analyzed the latest COVID-19 data. Compared to the previous version of the analysis,¹⁸ our study has a larger sample size, enhancing our ability to identify reliable genetic associations and make causal inferences. Importantly, technological advancements in analysis provide new insights into drug development.^5,20,21 We used SMR, which provided stronger statistical power and was validated through the MR method in the aforementioned study. Our full exploration of the genetic factors of COVID-19 will contribute to the development of new drugs and further reveal the pathogenic process of COVID-19. In addition, COVID-19 at different stages exhibits unique pathophysiology.⁴ Therefore, we provide comprehensive multi-omics evidence based on different COVID-19 symptoms. The results also reveal distinct driving factors.

We identified an association between LTBP2 and the risk of severe COVID-19, with this relationship supported by combined eQTLs and pQTLs evidence. LTBP2 is critical for maintaining elastic fiber stability and has been implicated in both pulmonary and cardiac fibrosis.²² In COVID-19, insufficient elastic components may exacerbate the excessive elastase-mediated degradation triggered by inflammatory bursts, which in turn reduces lung compliance and elevates the risk of severe disease.²³ This is consistent with our finding that reduced LTBP2 expression correlates with a lower risk of severe COVID-19. Notably, there is clear evidence that elastic tissue destruction elevates the risk of severe disease.²⁴ This is because the inability to maintain normal lung structure impairs lung function or causes respiratory distress,²⁵ progressing to persistent hypoxemia, multi-organ failure, and ultimately death.²⁶ Thus, the present study underscores the potential protective role of LTBP2 in preserving elastic fiber stability during severe COVID-19.

We found that ABO is involved in all three COVID-19 outcomes. ABO gene variations underpin the ABO blood group system and determine blood type. A COVID-19 GWAS showed that higher ABO expression correlates with increased risk of hospitalized COVID-19 (beta = 0.298).²⁷ Additionally, the ABO gene encodes the protein determining ABO blood type. In pQTLs analyses, we observed consistent risk effects of both gene expression and protein abundance on COVID-19, with this consistency maintained across multi-omics analyses. The precise mechanism by which the ABO protein influences COVID-19 risk remains unclear. However, carriers of the A and B alleles express glycosyltransferase activity, which converts the H antigen to A or B antigens, whereas individuals with blood type O lack this enzymatic activity due to a gene deletion (frameshift mutation). This underlies the differences in COVID-19 susceptibility across blood types.⁵

Additionally, several observational insights may provide relevant links: for instance, naturally occurring anti-A antibodies could inhibit spike protein-mediated cell invasion via the ACE2 receptor—a hypothesized entry route for SARS-CoV-2.²⁸ Another potential explanation is that individuals with blood type A have a higher risk of cardiovascular disease—a known risk factor for COVID-19—whereas those with type O have a lower cardiovascular disease risk.²⁹ Additionally, we identified that gene-predicted ABO methylation sites (cg07241568, cg24267699) elevate the risk of severe COVID-19; these sites showed consistent risk effects at the protein abundance level, further reinforcing the robustness of our findings. Previous studies have shown that the gene-predicted ABO methylation site cg24267699 elevates the risk of coronary artery disease,³⁰ which may indirectly heighten the risk of severe COVID-19. However, clear evidence linking methylation of cg07241568 to an elevated risk of severe COVID-19 remains lacking. Nonetheless, our findings provide validation at the protein abundance level.

SFTPD protein abundance was negatively correlated with the risk of severe and hospitalized COVID-19, consistent with the findings of Palmos et al.³¹ While there may be sample differences in blood protein exposure and COVID-19 outcomes, the consistent risk estimates from our current findings are noteworthy. Additionally, SFTPD downregulation impairs host immunity, potentially reducing lung gas exchange function.³² Meanwhile, SFTPD colocalization was higher in severe COVID-19 than in hospitalized cases, indirectly indicating a more prominent role for SFTPD in severe disease. Thus, SFTPD holds potential for preventing severe COVID-19 and alleviating symptoms like respiratory distress.

We also identified that LRCH4 may exert a potential protective effect against COVID-19 in the population. Silencing LRCH4 can attenuate lipopolysaccharide-induced production of multiple cytokines and dampen innate immune responses in vivo.³³ Additionally, lipopolysaccharide contributes to the aggregation of the SARS-CoV-2 spike protein.³⁴ Immune dysregulation plays a critical role in the pathophysiology of SAR-CoV-2, leading to severe lung injury and respiratory failure.³⁵ This suggests that LRCH4 may alleviate the development of COVID-19 by regulating immune responses. However, the precise mechanism underlying LRCH4's role in COVID-19 pathogenesis remains unclear. While multi-omic evidence is lacking, given the regulatory effect of protein abundance on phenotypes, LRCH4 shows promise as a potential therapeutic target for COVID-19.

We identified GART as a potential target for COVID-19. Phosphoribosylglycinamide formyltransferase (GART) is critical to the purine biosynthesis pathway, which is essential for numerous cell functions, including signaling and proliferation.³⁶ Purines are involved in viral RNA synthesis, and reducing purine levels may decrease viral replication, thus inhibiting viral infection.³⁷ In COVID-19, GART is significantly upregulated compared to healthy controls,³⁸ consistent with our findings. Notably, GART's involvement in purine synthesis elevates the risk of COVID-19 onset. Moreover, dietary regulation of purines may aid in controlling COVID-19 infection.³⁷ These findings collectively suggest that GART could serve as a potential therapeutic target for COVID-19.

This study had several strengths. Firstly, our study only included samples of European ancestry, reducing bias caused by genetic backgrounds. Secondly, we utilized MR to provide genomic information on the susceptibility risk of COVID-19,²⁷ including genetic, methylation, and protein abundance data. Previous studies had only been able to provide summarized information on the transcriptome, without assessing the impact of protein abundance. Additionally, we employed the latest version (v7) of the COVID-19 Host Genetics Initiative, which includes a larger number of COVID-19 samples. This enabled us to attain sufficient statistical power to elucidate causal relationships. Thirdly, we utilized MR and SMR to assess the impact of protein abundance on the risk of COVID-19, ensuring the robustness of our findings. Finally, we integrated evidence from multiple omics levels, strengthening the causal association between these genes and the risk of COVID-19.

However, this study still has some limitations. Firstly, the susceptibility factors of these genetic loci for COVID-19 in other ethnicities remain unknown. Secondly, the GWAS did not control for all confounding factors, such as diabetes, obesity, cardiovascular diseases, among others, which could act as risk factors for COVID-19. Thirdly, although we conducted sensitivity analyses to reduce bias, it may not completely eliminate associations caused by horizontal pleiotropy. Moreover, further extensive research is needed to validate the impact of these genes on the risk of COVID-19.

Conclusion

In the context of continuous COVID-19 mutations, we have elucidated the heterogeneous molecular mechanisms of different symptomatic COVID-19 from a multi-omics perspective. And evidence for multi-level targets was proposed based on MR. This contributes to suggestions for improving the problem-oriented drug discovery process, emphasizing best practices for the innovative management of new drugs targeting COVID-19. The future goal is to develop new drugs or repurpose existing drugs targeting the pathogenic sites of COVID-19 to mitigate its pandemic.

Supplemental Material

sj-docx-1-sci-10.1177_00368504251385959 - Supplemental material for Identification of COVID-19 susceptibility genes by Mendelian randomization

Supplemental material, sj-docx-1-sci-10.1177_00368504251385959 for Identification of COVID-19 susceptibility genes by Mendelian randomization by Hao Wang, Keru Ma, Junpeng Wu, Ming Shan and Guoqiang Zhang in Science Progress

Supplemental Material

sj-docx-2-sci-10.1177_00368504251385959 - Supplemental material for Identification of COVID-19 susceptibility genes by Mendelian randomization

Supplemental material, sj-docx-2-sci-10.1177_00368504251385959 for Identification of COVID-19 susceptibility genes by Mendelian randomization by Hao Wang, Keru Ma, Junpeng Wu, Ming Shan and Guoqiang Zhang in Science Progress

Footnotes

ORCID iD

Guoqiang Zhang

Ethical considerations

This study did not require ethical approval as all analysis was based on publicly available aggregated statistics without access to individual data. The included GWAS studies obtained informed consent from the study participants and were approved by the relevant local ethics committees.

Consent to participate

This research has been conducted using published studies and consortia providing publicly available summary statistics. All original studies have been approved by the corresponding ethical review board, and the participants have provided informed consent. In addition, no individual-level data was used in this study. Therefore, no new ethical review board approval was required.

Authors’ contributions

HW and KM designed the study, analyzed and interpreted the data, and drafted the manuscript. HW, KM, JW, and MS analyzed and interpreted the data. QZ conceived and designed the study and revised the manuscript. The authors read and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials

All data were from open access GWAS large data (Supplement table S1). The current MR analysis utilized publicly available datasets including the COVID-19 Host Genetics Initiative (Release 7, https://www.covid19hg.org/) and FinnGen study ().

Supplemental material

Supplemental material for this article is available online.

References

Thomson

. The COVID-19 pandemic: a global natural experiment. Circulation 2020; 142: 14–16.

Barboza

Chambergo-Michilot

Velasquez-Sotomayor

, et al. Assessment and management of asymptomatic COVID-19 infection: a systematic review. Travel Med Infect Dis 2021; 41: 102058.

The COVID-19 Host Genetics Initiative . A global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur J Hum Genet 2020; 28: 715–718.

Pairo-Castineira

Clohisey

Klaric

, et al. Genetic mechanisms of critical illness in COVID-19. Nature 2021; 591: 92–98.

Ellinghaus

Degenhardt

Bujanda

, et al. Genomewide association study of severe COVID-19 with respiratory failure. N Engl J Med 2020; 383: 1522–1534.

Davies

Holmes

Davey Smith

. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ 2018; 362: k601.

Võsa

Claringbould

Westra

, et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat Genet 2021; 53: 1300–1310.

McRae

Marioni

Shah

, et al. Identification of 55,000 replicated DNA methylation QTL. Sci Rep 2018; 8: 17605.

Sun

Chiou

Traylor

, et al. Plasma proteomic associations with genetics and health in the UK biobank. Nature 2023; 622: 329–338.

10.

Zeng

Zhang

, et al. Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits. Nat Commun 2018; 9: 918.

11.

Chen

Ruan

, et al. Therapeutic targets for inflammatory bowel disease: proteome-wide Mendelian randomization and colocalization analyses. EBioMedicine 2023; 89: 104494.

12.

Chen

Ruan

Sun

, et al. Multi-omic insight into the molecular networks of mitochondrial dysfunction in the pathogenesis of inflammatory bowel disease. EBioMedicine 2024; 99: 104934.

13.

Sun

Zhao

Jiang

, et al. Identification of novel protein biomarkers and drug targets for colorectal cancer by integrating human plasma proteome with genome. Genome Med 2023; 15: 75.

14.

Coccia

. Sources, diffusion and prediction in COVID-19 pandemic: lessons learned to face next health emergency. AIMS Public Health 2023; 10: 145–168.

15.

Liu

Yang

Feng

, et al. Mendelian randomization analysis identified genes pleiotropically associated with the risk and prognosis of COVID-19. J Infect 2021; 82: 126–132.

16.

Ying

Jia

, et al. Single-cell transcriptome-wide Mendelian randomization and colocalization reveals immune-mediated regulatory mechanisms and drug targets for COVID-19. EBioMedicine 2025; 113: 105596.

17.

Baranova

Cao

Zhang

. Unraveling risk genes of COVID-19 by multi-omics integrative analyses. Front Med (Lausanne) 2021; 8: 738687.

18.

Zhu

Liu

, et al. An integrative multiomics analysis identifies putative causal genes for COVID-19 severity. Genet Med 2021; 23: 2076–2086.

19.

Núñez-Delgado

Bontempi

Coccia

, et al. SARS-CoV-2 and other pathogenic microorganisms in the environment. Environ Res 2021; 201: 111606.

20.

Coccia

. Problem-driven innovations in drug discovery: co-evolution of the patterns of radical innovation with the evolution of problems. Health Policy Technol 2016; 5: 143–155.

21.

Coccia

MJTA

Management

. Sources of technological innovation: radical and incremental innovation problem-driven to support competitive advantage of firms. Technol Anal Strategic Manag 2017; 29: 1048–1061.

22.

Park

Ranjbarvaziri

Lay

, et al. Genetic regulation of fibroblast activation and proliferation in cardiac fibrosis. Circulation 2018; 138: 1224–1235.

23.

Heinz

. Elastases and elastokines: elastin degradation and its significance in health and disease. Crit Rev Biochem Mol Biol 2020; 55: 252–273.

24.

Naeije

Caravita

. Phenotyping long COVID. Eur Respir J 2021; 58: 2101763.

25.

Zhang

, et al. Discharge may not be the end of treatment: pay attention to pulmonary fibrosis caused by severe COVID-19. J Med Virol 2021; 93: 1378–1386.

26.

Yuki

Fujiogi

Koutsogiannaki

. COVID-19 pathophysiology: a review. Clin Immunol 2020; 215: 108427.

27.

Krishnamoorthy

Cheung

. Transcriptome-wide summary data-based Mendelian randomization analysis reveals 38 novel genes associated with severe COVID-19. J Med Virol 2023; 95: e28162.

28.

Guillon

Clément

Sébille

, et al. Inhibition of the interaction between the SARS-CoV spike protein and its cellular receptor by anti-histo-blood group antibodies. Glycobiology 2008; 18: 1085–1093.

29.

Bayoumi

Vickers

, et al. ABO(H) blood groups and vascular disease: a systematic review and meta-analysis. J Thromb Haemost 2008; 6: 62–69.

30.

Aslibekyan

Agha

Colicino

, et al. Association of methylation signals with incident coronary heart disease in an epigenome-wide assessment of circulating tumor necrosis factor α. JAMA Cardiol 2018; 3: 463–472.

31.

Palmos

Millischer

Menon

, et al. Proteome-wide Mendelian randomization identifies causal links between blood proteins and severe COVID-19. PLoS Genet 2022; 18: e1010042.

32.

Suresh

Mohanty

Avula

, et al. Quantitative proteomics of hamster lung tissues infected with SARS-CoV-2 reveal host factors having implication in the disease pathogenesis and severity. FASEB J 2021; 35: e21713.

33.

Aloor

Azzam

Guardiola

, et al. Leucine-rich repeats and calponin homology containing 4 (Lrch4) regulates the innate immune response. J Biol Chem 2019; 294: 1997–2008.

34.

Petrlova

Samsudin

Bond

, et al. SARS-CoV-2 spike protein aggregation is triggered by bacterial lipopolysaccharide. FEBS Lett 2022; 596: 2566–2575.

35.

Cui

Fang

De Souza

, et al. Innate immune cell activation causes lung fibrosis in a humanized model of long COVID. Proc Natl Acad Sci U S A 2023; 120: e2217199120.

36.

Cong

Huang

, et al. Increased expression of glycinamide ribonucleotide transformylase is associated with a poor prognosis in hepatocellular carcinoma, and it promotes liver cancer cell proliferation. Hum Pathol 2014; 45: 1370–1378.

37.

Morais

AHA

Passos

Maciel

BLL

, et al.

Can probiotics and diet promote beneficial immune modulation and purine control in coronavirus infection?

Nutrients 2020; 12: 1737.

38.

Pahl

Le Coz

, et al. Implicating effector genes at COVID-19 GWAS loci using promoter-focused capture-C in disease-relevant immune cell types. Genome Biol 2022; 23: 125.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.22 MB

0.03 MB