Sage Journals: Discover world-class research

Abstract

Background:

Preeclampsia (PE) is a complication of pregnancy characterized by hypertension, with limited therapeutic options and variable treatment response. Identifying novel drug targets is urgently needed.

Method:

We performed Mendelian randomization (MR) to explore potential drug targets for PE using data from the UK Biobank (nCase = 184, nControl = 361,010) and the FinnGen database (nCase = 7,377, nControl = 211,957). Genetic instruments for pQTLs were obtained from five proteomic studies. Bayesian colocalization analysis and summary-data-based MR (SMR) analysis were performed to assess the causal relationship between two related signals (protein levels and PE risk). We further conducted single-cell type expression analysis and phenome-wide MR. In addition, a protein–protein interaction network of key genes was constructed via the GeneMANIA website.

Results:

At a significance level of p < 5 × 10⁻⁸, MR analysis revealed seven protein-PE pairs. Gene prediction indicated that KDEL (Lys-Asp-Glu-Leu) Containing 2 and Keratin 18 (KRT18) were positively correlated with the risk of PE, whereas the other five proteins (Complement Factor B [CFB], FYN Proto-Oncogene [FYN], RAN Binding Protein 1 [RANBP1], Amphoterin-Induced Gene and ORF 1 [AMIGO1], and Arginase 2 [ARG2]) were negatively correlated. None of the five proteins had reverse causality. Bayesian colocalization analysis verified the positive correlation of CFB (coloc.coloc.susie PPH4 = 1) and KRT18 (coloc.coloc.abf PPH4 = 0.74) with PE. The genes encoding the proteins KRT18, FYN, ARG2, and RANBP1 were distributed in specific cell types within PE tissues from patients with PE. Moreover, the examination of single-cell localization provided insights into the extensive distribution of the KRT18 gene, which is highly expressed in the villous cytotrophoblast (VCT) and extravillous trophoblast (EVT) populations.

Conclusion:

Our MR analysis suggested that the plasma proteins KRT18, FYN, RANBP1, AMIGO1, and ARG2 had causal effects on PE risk. These findings indicated that these five proteins might be promising druggable targets for PE and warrant further clinical therapy, especially KRT18.

Keywords

preeclampsia drug target Mendelian randomization

Introduction

Preeclampsia (PE) is a pregnancy-specific multisystem disorder with an incidence of ∼2–5% that can be diagnosed by new-onset hypertension and one other PE-associated symptom or sign (e.g., proteinuria or widespread end-organ injury) and typically occurs after 20 weeks of gestation.^1,2 PE contributes significantly to maternal and perinatal morbidity and mortality worldwide, highlighting an emerging field of research. PE can be classified into two major subtypes: (a) early-onset PE (delivery at <34 weeks gestation), which is primarily attributed to uteroplacental ischemia, and (b) late-onset PE (delivery at ≥34 weeks gestation), which is predominantly associated with a metabolic crisis leading to an imbalance between fetal requirements and maternal resources.³ Although the pathophysiological mechanisms of PE remain uncertain, several triggers have been identified in previous studies. The placenta supports the fetus throughout pregnancy; however, abnormal placental development marked by shallow invasion of trophoblast cells into the uterine wall is considered a leading cause of PE.⁴ Immune-mediated placental dysfunction and systemic endothelial injury are the primary pathological factors of PE. Soluble fms-like tyrosine kinase-1 (sFlt-1), placental growth factor (PlGF), endoglin, and microRNAs are potential biomarkers for the prediction and diagnosis of PE, providing opportunities for personalized monitoring and intervention strategies. A prognostic study conducted by Zeisler et al. revealed that an sFlt-1/PlGF ratio ≤38 can accurately predict the development of PE within 1 week, with a negative predictive value of 99.3%.⁵ Moreover, the INHBA, OPRK1, and TPBG genes were found to be associated with PE, and a predictive model was established.⁶ Despite these advancements, early diagnostic tests and effective treatments for PE should be tested.

Plasma proteins, which can originate from any organ, cell, or even from the mother through placental contributions from the mother, participate in a range of biological processes (BPs), including signaling, inflammation, and transportation.^7,8 Sarosh et al. demonstrated that circulating plasma antiangiogenic factors such as sFlt-1 were effective biomarkers for risk stratification and PE screening.⁹ Single-nucleotide polymorphisms (SNPs) link protein levels to genetic loci, and publicly available genome-wide association studies (GWASs) have identified protein quantitative trait loci (pQTLs) associated with a multitude of plasma proteins. These pQTLs reflect the circulating levels of plasma proteins and can be used to explore the causality between plasma proteins and PE. Moreover, Mendelian randomization (MR) has been widely used for potential biomarker screening and drug target development. Owing to advances in genetic instruments using SNPs identified by GWASs, MR analysis based on the integration of GWAS and pQTL data has promoted the development of novel therapeutic strategies for many diseases.

In this study, our analysis aimed to evaluate the causal effect of plasma proteins on PE and to identify potential biomarkers for PE. First, we used MR to identify plasma proteins potentially associated with PE risk using pQTL data from five large-scale proteomic studies. The primary findings were subsequently further validated using Bayesian colocalization, summary-data-based MR (SMR), and heterogeneity in dependent instruments (HEIDI) analysis. Moreover, single-cell type expression analysis was performed, and we identified specific cell types on the basis of target protein-coding genes that were enriched in PE tissues. We then constructed a protein–protein interaction (PPI) network to further investigate the therapeutic potential of our plasma protein biomarkers.

Materials and Methods

Study design

Supplementary Figure S1 displays the entire study design. To summarize, we utilized pQTL data from five large-scale proteomic studies and employed a two-phase (discovery and replication) proteome-wide MR framework to investigate the associations of pQTLs with PE.^10–14 The private patient data from datasets are deidentified. Bayesian colocalization, SMR, and HEIDI tests were utilized to validate the causal links between protein biomarkers and PE. Furthermore, single-cell type expression analysis was conducted to identify the specific cell types in which target protein-coding genes were enriched in PE tissues. Finally, a PPI network was constructed using the identified protein biomarkers to further investigate potential therapeutic targets.¹⁵

Data source

The PE data were derived from the UK Biobank (UKBB, https://www.ukbiobank.ac.uk/) and the FinnGen database (https://r10.finngen.fi/). The UKBB dataset (study code O14) includes data from 184 PE patients and 361,010 controls of European ancestry (http://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=O14). The FinnGen dataset, coded O15_PREECLAMPS, includes data from 7,377 PE patients and 211,957 controls of European ancestry (https://storage.googleapis.com/finngen-public-data-r10/summary_stats/finngen_R10_O15_PREECLAMPS.gz). In our MR analysis, the FinnGen dataset was used for the discovery phase, and the UKBB dataset was used for the replication phase. Within the FinnGen dataset, genetic variants related to PE with a p value <5 × 10⁻⁸ and with linkage disequilibrium (R² < 0.001) were selected as instrumental variables for the reverse MR analysis of PE.

Protein-related MR studies

The pQTLs from the five proteomic studies mentioned were utilized for the selection of genetic instruments. The platform ID of each protein was mapped to gene symbols and unified using annotations from the original studies and manual examinations. Subsequently, SNPs were mapped to the human genome build 37 (GRCh37) to standardize genome coordinates. Genetic instruments and proteins were selected using the following criteria: (i) SNPs related to any protein were selected based on a stringent significance threshold (p < 5 × 10⁻⁸); (ii) Limited to the complex linkage disequilibrium structure, SNPs and proteins within the major histocompatibility complex (MHC) region (chr6: 26.0–34.0 Mb) were excluded¹⁵; (iii) The independent pQTLs were identified for each protein using linkage disequilibrium clustering (R² < 0.001); and (iv) The strength assessment of the genetic instruments was performed with R² and F-statistics, where R² is the proportion of protein level variability explained by each genetic instrument, and instruments with an F-statistic less than 10 were filtered out. For proteins that appeared multiple times in the studies, the one with the highest sum of R² values was chosen. According to the following criteria, genetic instruments were further divided into two types: cis- or trans-pQTLs. A pQTL was considered cis when the leading SNP was within 1 Mb of the transcription start site of the protein-coding gene. However, those outside the aforementioned region were considered trans-pQTLs.^10,12,14 Ultimately, all cis-pQTLs identified in the studies were selected as our instrumental variables, incorporating a total of 7,468 cis-pQTLs and 2,968 unique plasma proteins into the analysis.

The two-sample MR package was used for the MR analysis.¹⁶ For proteins with only one genetic instrument, the Wald ratio method was employed to calculate the log odds change in PE risk per standard deviation (SD) increase in circulating protein levels, using the instrument serving as a proxy. For proteins with multiple genetic instruments, the inverse variance weighted (IVW) method was applied to obtain MR effect estimates. If proteins had multiple genetic instruments, we employed the IVW method to estimate MR effect sizes. To determine heterogeneity among genetic instruments, Cochran’s Q test was conducted. In addition, we performed further analyses, including simple median, weighted median, and MR–Egger analyses, to address potential pleiotropy.

MR analysis was conducted on identified proteins using PE GWAS data from the FinnGen database and UKBB, initially using p < 0.05 as the threshold for preliminary significance. A meta-analysis of the MR data from two sources was conducted employing either a random effects model or a fixed effects model. The model choice was based on dataset heterogeneity: A random effects model was used when heterogeneity was present, and a fixed effects model was applied when heterogeneity was absent. For multiple testing correction, we performed Bonferroni correction, setting the significance level at p.adj <0.05. All analyses were conducted with R software version 4.1.2.

Colocalization analysis

To calculate whether two associated signals (protein levels and PE risk) are consistent with a shared causal variant, rather than a noncausal variant due to linkage disequilibrium, Bayesian colocalization analysis was performed using protein data and FinnGen PE GWAS data utilizing the “coloc” package.¹⁷ Five hypotheses of colocalization analysis provide a comprehensive framework for investigating the genetic underpinnings of both protein levels and PE risk at the genomic locus: (i) no causal variant affects protein levels or PE risk (H0); (ii) a causal variant affects only protein levels (H1); (iii) a causal variant affects only PE risk (H2); (iv) two distinct causal variants affect protein levels and PE risk independently (H3); and (v) a shared causal variant affects both protein levels and PE risk (H4).

Bayesian colocalization analysis was employed to calculate posterior probabilities for the five hypotheses mentioned above, evaluating whether the two traits share a single genetic variant. If a protein is linked to multiple pQTLs, colocalization analysis is conducted separately for each pQTL, focusing on the one exhibiting the most compelling evidence of colocalization. The analysis utilized default parameters, with prior probabilities set as follows: p1 = 1 × 10⁻⁴ (for SNP association with protein), p2 = 1 × 10⁻⁴ (for SNP association with PE), and p12 = 1 × 10⁻⁵ (for SNP association with both protein and PE).

In our study, we rigorously confirmed hypothesis 4 (PPH4), which suggests the presence of a shared genetic variant affecting both protein levels and PE within a specific genomic region. We employed two widely recognized algorithms, namely, coloc.abf and coloc.susie, to evaluate the extent of colocalization. Strong colocalization was defined as PPH4 >80%, moderate colocalization as PPH4 >60% and <80%, and weak colocalization as PPH4 <60%. A gene was considered to show evidence of colocalization if it demonstrated a gene-based PPH4 >60%, as determined by at least one of the algorithms.

SMR analysis

SMR analysis was further performed to verify the causal correlation between proteins and PE. Furthermore, we conducted the HEIDI test, employing multiple SNPs within a specific genomic region, to distinguish proteins associated with PE risk due to shared genetic variation from those influenced solely by linkage disequilibrium. Both the SMR and HEIDI tests were performed by SMR software (version 1.3.1). A significance threshold of p < 2.38 × 10⁻³ (0.05/21) was established for the SMR analysis. A p value >0.05 in the HEIDI test indicated that the observed association between the protein and PE was not driven by linkage disequilibrium. However, due to database limitations, not all proteins were subjected to HEIDI testing.

Downloading and preprocessing of single-cell sequencing data¹⁸

The Gene Expression Omnibus (GEO) database hosts an extensive collection of single-cell sequencing data. In this study, the single-cell sequencing dataset GSE173193 containing data from a total of 8 PE samples was acquired from the GEO database. This dataset included data from 2 PE placental tissue samples and 2 normal placental samples. The single-cell raw data from GSE173193 were imported using the Seurat package (version 4.2.0) in R.¹⁹ Initially, cells and genes of low quality were filtered out using the following criteria: (i) cells expressing fewer than 200 genes were removed, and (ii) genes not detected in any cells were discarded. Cells with a gene expression count ranging from 200 to 9,000 and cells with mitochondrial gene percentages under 20% were maintained. Moreover, cells with less than 90,000 unique molecular identifier counts were maintained. The “Normalize Data” function within the Seurat R package was used for data normalization. Following normalization, highly variable genes in single cells were identified by balancing the relationship between average expression and dispersion. Subsequently, principal component (PC) analysis was performed, and the significant PCs were utilized as inputs for graph-based clustering. The Harmony method was employed to eliminate batch effects across different samples. For clustering, the FindClusters function, which is based on a clustering algorithm optimizing shared nearest neighbor modularity, was utilized to produce 21 clusters across 25 PC components at a resolution of 0.4. t-Distributed stochastic neighbor embedding (t-SNE) was then performed using the “Run t-SNE” function. Cell clustering was visualized using t-SNE-1 and t-SNE-2.

PPI network analysis

The GeneMANIA website (http://genemania.org) facilitates the prediction of relationships between functionally similar genes and central genes, encompassing PPIs, protein–DNA interactions, pathways, physiological and biochemical reactions, coexpression, and colocalization.²⁰ In this study, a PPI network of key genes was constructed via the GeneMANIA website. Subsequently, Gene Ontology (GO) analysis was performed on the key genes and their interacting genes using the “clusterProfiler” R package.

Phenome-wide MR studies

To investigate the potential side effects of four drug targets, we employed gene expression as the exposure factor and utilized disease summary statistics from the UKBB (n ≤ 420,531) as outcomes for a comprehensive phenome-wide MR analysis. The UKBB Disease GWASs were conducted using the scalable and accurate implementation of generalized mixed model (SAIGE) method, which addresses imbalanced case–control ratios. Due to statistical power considerations, we selected 851 traits (diseases) other than PE for phenome-wide MR analysis, each with more than 100 cases. Summary statistics for disease-associated SNPs were downloaded from the SAIGE GWAS (https://www.leelabsg.org/resources). Subsequently, MR analysis was performed using either the IVW or Wald ratio method with identical parameters, leveraging pQTLs. A causal effect with a false discovery rate (FDR) <0.05 was deemed statistically significant.

Results

Proteome-wide MR analysis identified seven circulating plasma proteins associated with PE

All the genetic instruments had F-statistics greater than 10, indicating strong instrument strength (Supplementary Table S1). Using the Wald ratio or IVW methods, and after the Bonferroni correction (p.adj < 0.05), a total of seven proteins were found to be significantly associated with the risk of PE (Table 1). Gene prediction indicated that KDEL (Lys-Asp-Glu-Leu) Containing 2 (KDELC2) and Keratin 18 (KRT18) were positively correlated with the risk of PE, whereas the other five proteins (Complement Factor B [CFB], FYN Proto-Oncogene [FYN], RAN Binding Protein 1 [RANBP1], Amphoterin-Induced Gene and ORF 1 [AMIGO1], and Arginase 2 [ARG2]) were negatively associated with the risk of PE, suggesting that lower levels of these five proteins are linked to a greater risk of PE (Fig. 1). These associations were generally consistent across additional analyses including the simple median, weighted median, and MR–Egger analyses. In addition, the CFB protein was found to be heterogeneous and pleiotropic, the KDELC2 protein was found to be neither heterogeneous nor pleiotropic (pheterogeneity > 0.05, pleiotropy > 0.05), and the heterogeneity and pleiotropy of the remaining proteins could not be tested due to data structure limitations (Table 2). The results of the proteome-wide MR within the discovered protein set are presented in Supplementary Table S1.

FIG. 1.

The meta-analysis results of proteome-wide MR analysis.

Table 1.

The Results of the Proteome-Wide MR

Protein	OR	p.adj	Coloc.abf	Coloc.susie	SMR	HEIDI	Type
KRT18	1.002900696	0.041541783	0.741	—	0.002	—	Tier 1
AMIGO1	0.997385008	0.045547411	0.329	—	0.011	—	Tier 2
ARG2	0.997224679	0.017814606	0.187	—	0.023	—	Tier 2
CFB	0.999380403	0.041541783	0.012	1	0.595	0.27	Tier 2
FYN	0.998933387	0.045547411	—	—	0.018	—	Tier 2
RANBP1	0.997157217	0.017814606	0.252	—	0.021	—	Tier 2
KDELC2	1.000347006	0.045547411	0.16	—	NA	NA	Tier 3

AMIGO1, Amphoterin-Induced Gene and ORF 1; ARG2, Arginase 2; CFB, Complement Factor B; FYN, FYN Proto-Oncogene; HEIDI, heterogeneity in dependent instruments; KDELC2, KDEL (Lys-Asp-Glu-Leu) Containing 2; KRT18, Keratin 18; MR, Mendelian randomization; RANBP1, RAN Binding Protein 1; SMR, summary-data-based Mendelian randomization.

Table 2.

The Heterogeneity and Pleiotropy Testing for CFB and KDELC2

ID.outcome	Outcome	Exposure	Method	Q	Q_df	Q_p Value	Heterogeneity	Egger_intercept	SE	p Value
finngenR10O15	PE	CFB	MR Egger	1.380677312	2	0.5	FALSE	0.5	0.5	0.5
finngenR10O15	PE	CFB	Inverse variance weighted	8.536497345	3	0.04	TRUE	0.04	0.04	0.04
finngenR10O15	PE	KDELC2	MR Egger	0.380727449	3	0.94	FALSE	0.94	0.94	0.94
finngenR10O15	PE	KDELC2	Inverse variance weighted	1.380111822	4	0.85	FALSE	0.85	0.85	0.85
UKBB	PE	CFB	MR Egger	1.224008238	2	0.54	FALSE	0.54	0.54	0.54
UKBB	PE	CFB	Inverse variance weighted	4.257203831	3	0.23	FALSE	0.23	0.23	0.23
UKBB	PE	KDELC2	MR Egger	1.422691747	3	0.7	FALSE	0.7	0.7	0.7
UKBB	PE	KDELC2	Inverse variance weighted	1.555057662	4	0.82	FALSE	0.82	0.82	0.82

PE, preeclampsia.

In a meta-analysis combining data from the two sources, several proteins were found to be significantly associated with PE risk. The odds ratios (ORs) for PE per SD increase in the levels of proteins predicted by gene analysis, with 95% confidence intervals (CIs), were as follows: the OR for KDELC2 was 1.000 (95% CI: 1.000–1.001), that for CFB was 0.999 (95% CI: 0.999–1.000), that for FYN was 0.999 (95% CI: 0.998–1.000), that for KRT18 was 1.003 (95% CI: 1.001–1.005), that for RANBP1 was 0.997 (95% CI: 0.995–0.999), that for AMIGO1 was 0.997 (95% CI: 0.995–1.000), and that for ARG2 was 0.997 (95% CI: 0.995–0.999).

In the reverse MR analysis, no association was detected between genetic susceptibility to PE and the levels of these seven proteins (Table 3).

Table 3.

The Results of the Reverse MR Analysis

Exposure	Beta	p Value
AMIGO1	0.020250885	0.726724792
ARG2	−0.006800121	0.923584996
CFB	−0.049217538	0.781869819
KDELC2	−0.026321498	0.728712637
KRT18	0.049811585	0.369718619
RANBP1	−0.090406601	0.22505947
RANBP1	−0.090406601	0.22505947

Colocalization analysis supports the causal relationship between two proteins and PE

Among the seven potential causal proteins identified by proteome-wide MR, one protein (FYN) lacked complete summary-level data, rendering it untestable through colocalization analysis. Among the remaining six proteins, strong evidence of genetic colocalization supported the causal relationship of two proteins (CFB with a coloc.susie PPH4 of 1 and KRT18 with a coloc.abf PPH4 of 0.74) with PE under various priors and windows. This suggests a high probability of shared single causal variants between CFB protein levels and PE risk and a moderate probability of shared multiple causal variants between KRT18 protein levels and PE risk (Table 4).

Table 4.

The Causal Relationship Between Six Proteins and PE by Colocalization Analysis

Outcome	Exposure	Source	SNP	Windows	nsnps	PP.H4.abf	Method
PE	CFB	finngenR10O15	rs2072634	90	307	1	susie
PE	AMIGO1	finngenR10O15	rs2570972	90	576	0.328831573	coloc
PE	ARG2	finngenR10O15	rs61990120	90	711	0.187475997	coloc
PE	KDELC2	finngenR10O15	rs74911261	90	609	0.159964041	coloc
PE	KRT18	finngenR10O15	rs4919741	90	967	0.740534701	coloc
PE	RANBP1	finngenR10O15	rs76889809	90	866	0.252057097	coloc

The SMR and HEIDI tests validated six pathogenic proteins

To further validate the observed findings, SMR and HEIDI tests were conducted on the seven proteins with complete summary-level data. All proteins except KDELC2 passed the SMR test (p.adj < 0.05). Due to the lack of sufficient SNPs, proteins other than CFB could not undergo HEIDI testing. Based on the evidence, these proteins were categorized into three tiers. One protein (KRT18) that passed all tests was categorized as Tier 1 (Table 1). The five proteins that either failed the colocalization analysis or HEIDI test or could not be tested due to a lack of data (AMIGO1, ARG2, CFB, FYN, and RANBP1) were categorized as Tier 2 proteins (Table 1). The KDELC2 protein, which failed the meta-analysis, colocalization analysis, and HEIDI test, was categorized as Tier 3 (Table 1).

Research on cell type-specific expression in PE tissues

To investigate whether the genes encoding the six circulating proteins were enriched in specific cell types within PE tissues, we further conducted single-cell type expression analysis using single-cell RNA sequencing data from the GEO database. The cells were clustered into 21 clusters and further classified into nine cell types (granulocyte, villous cytotrophoblast [VCT], macrophage, extravillous trophoblast [EVT], myelocyte, thymus/natural killer [T/NK] cell, monocyte, syncytiotrophoblast [SCT], and B lymphocyte [B cell]). Figure 2B shows that the genes encoding protein KRT18 were primarily distributed in VCT and EVT cells (Fig. 2C). The genes encoding for proteins ARG2 and RANBP1 were mainly distributed in VCT cells (Fig. 2D and E). The gene encoding for protein FYN was predominantly found in T/NK cells (Fig. 2F). The genes encoding the proteins AMIGO1 and CFB were not enriched in any specific cell type in the single-cell dataset (Fig. 2G and H).

FIG. 2.

The single-cell type expression of protein-encoding genes identified through proteome-wide MR in PE tissues. (A) Identified nine cell types. (B) The distribution of each gene across different cells and groups. (C) The distribution of the encoding gene for protein KRT18 in the single-cell dataset. (D) The distribution of the encoding gene for protein ARG2. (E) The distribution of the encoding gene for protein RANBP1. (F) The distribution of the encoding gene for protein FYN. (G) The distribution of the encoding gene for protein AMIGO1. (H) The distribution of the encoding gene for protein CFB in the single-cell dataset.

PPI network

We constructed a PPI network for circulating proteins using the GeneMANIA database (Fig. 3A). To further investigate the functions of the characteristic proteins, we performed GO enrichment analysis on a total of 26 proteins, including 6 circulating proteins and 20 proteins associated with the 6 circulating proteins. The GO enrichment results revealed that BPs, such as negative regulation of chemokine production and response to l-glutamate, and molecular functions, such as the extrinsic component of the membrane (Fig. 3B and C, Table 5), were enriched in these proteins.

FIG. 3.

The protein interaction network. (A) Gene co-expression network diagram. (B) Lollipop chart for the GO enrichment of co-expressed genes. (C) The bar chart for the GO enrichment of co-expressed genes. GO, Gene Ontology.

Table 5.

The Results of PPI Network

ID	Ontology	Description	Gene ratio	Bg ratio	p Value	p.adj	q Value	GeneID	Count
GO:0032682	BP	Negative regulation of chemokine production	45,377	26/18,723	6.05E-06	0.005167641	0.003713466	ARG2/APOD/LILRB4	3
GO:1902065	BP	Response to l-glutamate	45,348	11/18,723	0.000101206	0.043215064	0.031054335	AMIGO1/FYN	2
GO:0019898	CC	Extrinsic component of membrane	45,408	309/19,550	0.000695282	0.049364992	0.040253144	FYN/RS1/SOCS3/PLAUR	4

PPI, protein–protein interaction.

MR analysis of GWASs on the identified PE drug target proteins and other diseases

We evaluated whether the expression of the six drug target proteins associated with PE plays a role in other diseases. Thus, a broader MR screening was conducted across 851 non-PE diseases or traits using the UKBB (Supplementary Table S2). CFB was significantly associated with digestive system diseases; lower levels of CFB were related to celiac disease (OR = 0.255457) and intestinal malabsorption (OR = 0.302310), while higher levels of CFB were associated with ulcerative colitis (OR = 1.760720) (Fig. 4 and Supplementary Table S3). No other diseases were found to have a significant association with these drug target proteins (FDR 0.05), and the summary results are presented in Supplementary Table S3.

FIG. 4.

Displays a Manhattan plot of the phenome-wide MR results for AMIGO1, AARG2, CFB, FYN, KRT18, and RANBP1. Note: In the phenome-wide MR results, the vertical axis represents p values. Each point represents a disease trait, and different colors signify the MR results for different expressions.

Discussion

To our knowledge, this is the first MR analysis to identify drug targets for PE based on pQTL data from five large-scale proteomic studies. Here, we identified seven proteins as candidate targets for PE, including KDELC2, KRT18, CFB, FYN, RANBP1, AMIGO1, and ARG2. Except for CFB and KDELC2, the five other proteins were shown to be associated with PE via MR analysis, further suggesting the reliability of the methods used in this study.

The observed modest ORs for proteins associated with the risk of PE imply that individual genetic influences are limited. This observation corresponds with the polygenic characteristics of complex diseases such as PE, in which the disease development is attributed to the cumulative impact of numerous variants with small effects across various biological pathways.^21,22 The presence of small effect sizes does not diminish their biological importance; instead, it may indicate a widespread distribution of risk within regulatory networks. Notably, proteins exhibiting modest genetic influences can still represent valuable therapeutic targets, particularly if they are situated at crucial regulatory points (for instance, upstream signaling nodes or network hubs) where slight adjustments could trigger significant downstream effects. A case in point is KRT18, which, despite its limited effect size, is abundantly expressed in trophoblasts and plays a vital role in regulating essential processes such as cell adhesion and apoptosis, highlighting its potential relevance in pharmacotherapy. To achieve robust causal inference, we adopted a comprehensive multi-tiered analytical strategy. This strategy included MR, Bayesian colocalization, SMR/HEIDI tests, single-cell expression analysis, and protein–protein interaction networks. Both CFB and KRT18 demonstrated evidence of shared causal variants with PE through colocalization analysis. Following thorough validation, proteins were classified into three tiers: KRT18 (Tier 1) successfully passed all analytical evaluations; KDELC2 (Tier 3) did not meet multiple testing criteria, while the remaining five proteins (ARG2, RANBP1, FYN, AMIGO1, and CFB) were designated as Tier 2. Furthermore, single-cell expression analysis revealed increased expression of KRT18, ARG2, RANBP1, and FYN specifically in trophoblast cells, and phenome-wide MR analysis revealed associations between CFB and various digestive diseases. Collectively, these findings support the biological plausibility and therapeutic promise of the identified proteins, particularly KRT18, in the context of PE pathogenesis.

Despite the development of new drugs for many years, the current therapeutic options for PE remain unsatisfactory. Considering the pathogenesis of PE, which is characterized by compromised placental vasodilation and increased maternal blood pressure, resulting in a reduced blood supply to the fetus,²³ we explored the causal proteins for PE and their distribution in placenta-associated cells. Notably, in our study, we found that the identified circulating plasma proteins were mainly distributed trophoblast cells.²⁴ We also conducted bidirectional MR analysis among the seven identified proteins and failed to observe any significant associations. The presence of the placental barrier may explain the absence of any significant bidirectional correlations. Although the evidence remains scarce, the current study suggested that plasma might be a valuable resource for identifying proteins associated with PE and that the proteins circulating in the plasma might be promising drug targets for PE treatment.

Among the five potential proteins identified in this study, the roles of ARG2 and KRT18 have been relatively explored in PE in previous studies.^25–28 Unlike our study, these studies only reported the levels of these proteins in the plasma of patients with PE. Owing to the importance of identifying protein drug targets for the success of precision or personalized medicine approaches, our study provides new insights into the development of therapeutic strategies for PE.

KRT18, also known as cytokeratin 18, is an intermediate filament protein that acts as proinflammatory cytokine in serum and apoptotic marker The correlation between circulating KRT18 levels and PE risk is positive, as previous studies on plasma KRT18 levels have reported significantly increased KRT18 expression in the plasma of patients with PE. Moreover, the genes encoding the protein KRT18 were primarily distributed in VCT and EVT cells. Despite the difference in tissue-specific expression of KRT18, it was the only protein that was validated as Tier 1, indicating a greater probability of KRT18 being a causal protein for PE.

KRT18, commonly referred to as cytokeratin 18, is classified as an intermediate filament protein that serves as an indicator of apoptosis and inflammation in serum samples.^29,30 Our research highlights a significant association between the levels of circulating KRT18 and the risk of developing PE, corroborating previous studies that have documented increased KRT18 levels in patients diagnosed with PE.²⁸ Moreover, KRT18 is expressed predominantly in VCT and EVT cells, and it has been recognized as a Tier 1 causal protein in the context of PE, providing compelling evidence for its role in the pathogenesis of this disease. Nevertheless, owing to the critical biological functions of KRT18, its ability to act as a direct therapeutic target necessitates careful consideration, despite its notable correlation with PE. As a fundamental component of epithelial intermediate filaments, KRT18 plays a vital role in maintaining cytoskeletal stability, facilitating cell–cell adhesion, responding to stress, and preserving tissue barriers, with pronounced expression observed in placental, hepatic, and gastrointestinal epithelial tissues. Direct inhibition of KRT18 may result in detrimental consequences, including disruption of epithelial structure, which can lead to apoptosis or necrosis; dysfunction of the placental barrier, affecting maternal–fetal exchange, and damage to hepatocytes. This assertion is further supported by its recognized role as a diagnostic biomarker in liver diseases. In light of these potential risks, KRT18 may be more appropriately regarded as a circulating biomarker or as a reflection of upstream regulatory mechanisms, such as those related to inflammation or oxidative stress, rather than a direct target for therapeutic intervention. Subsequent research should aim to identify specific modulators that can influence KRT18 expression or activity, facilitating safer, indirect therapeutic strategies. In summary, while KRT18 demonstrates a strong causal relationship with PE, its structural significance within epithelial tissues emphasizes its potential utility as a monitoring tool for disease rather than as a direct target for pharmacological treatment.

Human ARG2 is a mammalian arginase isoform encoded by ARG2, which is located on chromosome 14q2427.³¹ As a key hydrolase in the urea cycle, ARG2, which contains 354 amino acids, is abundantly expressed in mitochondria and is preferentially expressed in the kidney, lactating mammary gland and even macrophages.³² Consistent with our results, previous studies have indicated that a low concentration of ARG2 is associated with an increased risk of PE.³³ In addition, we found a relationship between ARG2 and VCT cells in PE tissues. Given the lack of colocalization of ARG2 and the specific distribution of ARG2 in VCT cells revealed by proteome-wide MR analysis, we speculated that the effect of ARG2 in PE might be blocked by the placental barrier. In other words, ARG2 might be a promising druggable target in the circulation of the placenta. In addition, our PPI analysis revealed that ARG2, APOD, and LILRB4 were significantly enriched in the negative regulation of chemokine production. We noticed that a reduction in plasma ARG2 levels was reported to improve the responsiveness to antihypertensive treatment in PE patients, which might suggest that A may have therapeutic value in PE and deserves further study. RANBP1 belongs to the RAS superfamily of small GTPases that participate in the internuclear transport of proteins, nucleic acids, and microRNAs and contribute to the cellular epigenomic signature.³⁴ According to previous studies, RANBP1 is not associated with the pathogenesis of PE. However, MR analysis revealed that RANBP1 is a potential target for PE medications, and RANBP1 was confirmed to be primarily distributed in VCT cells in this study. Because RANBP1 is an intracellular protein with high extracellular and intracellular expression, we hypothesized that it might be a drug target for PE.

FYN and AMIGO1 were protective proteins against PE in our study. FYN is a member of the src family of protein kinases that regulates multiple cellular processes, including cell adhesion, invasion, proliferation, survival, apoptosis, and angiogenesis.³⁵ FYN interacts with and phosphorylates a wide variety of proteins, such as RS1, SOCS3, and PLAUR, suggesting that FYN might act by reducing the stability of the amniotic membrane, promoting the development of PE. Notably, FYN was also confirmed to be enriched in T/NK cells, which are immune cells that play a major causative role in the pathology of PE.³⁶ Maternal immune tolerance is a special contributor to pregnancy. Insufficient T cells or inadequate functional competence are implicated in PE, which stems from placental insufficiency.³⁷ Taken together, these findings indicate that FYN might be a potential therapeutic target for PE and may be involved in immune responses. AMIGO1 is an LRR-domain cell adhesion molecule preferentially expressed on nerve cells that mediates the fasciculation and myelination of developing axons.³⁸ In contrast to the other identified PE targets in this study, AMIGO1 has been shown to affect neuronal genes.^39,40

In a recent study examining the pathogenesis of PE, Xu et al. performed MR and colocalization analyses involving 734 plasma proteins within the FinnGen cohort.⁴¹ Their research revealed several potential candidate proteins, such as CXCL10, PZP, AHSG, and UROS, indicating the involvement of immune and inflammatory pathways in the progression of this disease. Building upon the findings of Xu et al., the current investigation broadens the proteomic analysis by incorporating cis-pQTLs data from five extensive studies, which encompass a total of 2,968 plasma proteins.

From a methodological standpoint, we employed a two-stage design that included both discovery and replication phases within the FinnGen and UKBB cohorts. This approach was further enhanced by meta-analysis and validation techniques, including Bayesian colocalization, SMR/HEIDI tests, single-cell expression profiling, and the exploration of protein–protein interaction networks. Our analysis revealed a unique set of proteins—KRT18, ARG2, FYN, RANBP1, AMIGO1, and CFB—that are linked to cytoskeletal integrity, metabolic regulation, and cellular adhesion, thereby contrasting with the immune-inflammatory targets identified by Xu et al. Rather than presenting conflicting outcomes, these disparate findings underscore the multifaceted nature of PE and illustrate how diverse methodological frameworks can uncover complementary pathological mechanisms. These results imply that the pathogenesis of PE involves simultaneous dysregulation across immune, structural, and metabolic pathways. Future research should integrate multiomics data from a variety of populations to clarify the interactions among these mechanisms and facilitate the advancement of targeted therapeutic interventions.

Inevitably, there are several limitations in our study. First, we screened prioritized proteins, and the sensitivity analysis was adjusted in the analysis; therefore, potential bias could not be excluded. Moreover, most participants in the GWAS datasets were of European descent, and the results of this study might not be entirely applicable to subjects of non-European descent, which means that we need to be cautious when applying our results to other populations. In addition, some factors associated with PE, such as parity and maternal age, were not included. Due to the limited availability of clinical data from Asian groups, the causal correlation between these potential proteins and PE was not validated. Furthermore, importantly, the odd ratios of the analyzed proteins correlated with PE were not substantial, warranting clarification. Constrained by existing data, the observed effects of these proteins on PE were modest, whether positively or negatively correlated. However, significant p values and previous studies suggest their potential value as targets for PE. Finally, the examination of five proteins (KRT18, AMIGO1, ARG2, FYN, and RANBP1) was limited by the availability of SNPs, which impedes the practicality of the findings, particularly concerning heterogeneity and pleiotropy. Because a large amount of data is required, analyses of proteins other than CFB may not be entirely reliable, and the results of these analyses should be interpreted with caution. Therefore, future work should focus on the proteins related to PE, and additional experimental validation is required to substantiate these findings.

Conclusions

In general, our MR analysis suggested that the plasma levels of the identified proteins (KRT18, FYN, RANBP1, AMIGO1, and ARG2) are causally associated with PE risk. The identified proteins may be potential biomarkers or druggable targets for PE, especially circulating KRT18. Based on our results, we hypothesized that the protein KRT18 might play a role in the development of PE, and future clinical studies should be conducted to verify the effects of this protein on PE. Further work is needed to evaluate the credibility of these candidate proteins in PE treatment.

Author Disclosure Statement

No competing financial interests exist.

Footnotes

Supplemental Material

Abbreviations Used

References

1. Jung

, Romero

, Yeo

, et al. The etiology of preeclampsia. Am J Obstet Gynecol 2022;226(2s):Ss844–Ss866.

2. Chappell

, Cluver

, Kingdom

, et al. Pre-eclampsia. Lancet 2021;398(10297):341–354.

3. Chaiworapongsa

, Romero

, Gotsch

, et al. Preeclampsia at term can be classified into 2 clusters with different clinical characteristics and outcomes based on angiogenic biomarkers in maternal blood. Am J Obstet Gynecol 2023;228(5):569.e561–569.e524.

4. Burton

, Jauniaux

. The human placenta: New perspectives on its formation and function during early pregnancy. Proc Biol Sci 2023;290(1997):20230191.

5. Zeisler

, Llurba

, Chantraine

, et al. Predictive value of the sFlt-1:PlGF ratio in women with suspected preeclampsia. N Engl J Med 2016;374(1):13–22.

6. Wang

, Li

, Zhao

. Inflammation in preeclampsia: Genetic biomarkers, mechanisms, and therapeutic strategies. Front Immunol 2022;13:883404.

7. Suhre

, McCarthy

, Schwenk

. Genetics meets proteomics: Perspectives for large population-based studies. Nat Rev Genet 2021;22(1):19–37.

8. Pernemalm

, Sandberg

, Zhu

, et al. In-depth human plasma proteome analysis captures tissue proteins and transfer of protein variants across the placenta. Elife 2019;8:e41608.

9. Rana

, Lemoine

, Granger

, et al. Preeclampsia: Pathophysiology, challenges, and perspectives. Circ Res 2019;124(7):1094–1112.

10.

10. Ferkingstad

, Sulem

, Atlason

, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet 2021;53(12):1712–1721.

11.

11. Sun

, Chiou

, Traylor

, Regeneron Genetics Center. et al.; Plasma proteomic associations with genetics and health in the UK Biobank. Nature 2023;622(7982):329–338.

12.

12. Pietzner

, Wheeler

, Carrasco-Zanini

, et al. Genetic architecture of host proteins interacting with SARS-CoV-2. bioRxiv 2020:2020.07.01.182709.

13.

13. Gudjonsson

, Gudmundsdottir

, Axelsson

, et al. A genome-wide association study of serum proteins reveals shared loci with common diseases. Nat Commun 2022;13(1):480.

14.

14. Zhang

, Dutta

, Köttgen

, CKDGen Consortium. et al.; Plasma proteome analyses in individuals of European and African ancestry identify cis-pQTLs and models for proteome-wide association studies. Nat Genet 2022;54(5):593–602.

15.

15. Sun

, Zhao

, Jiang

, et al. Identification of novel protein biomarkers and drug targets for colorectal cancer by integrating human plasma proteome with genome. Genome Med 2023;15(1):75.

16.

16. Su

, Gu

, Dou

, et al. Systematic druggable genome-wide Mendelian randomisation identifies therapeutic targets for Alzheimer’s disease. J Neurol Neurosurg Psychiatry 2023;94(11):954–961.

17.

17. Kia

, Zhang

, Guelfi

, United Kingdom Brain Expression Consortium (UKBEC) and the International Parkinson’s Disease Genomics Consortium (IPDGC). et al.; Identification of candidate parkinson disease genes by integrating genome-wide association study, expression, and epigenetic data sets. JAMA Neurol 2021;78(4):464–472.

18.

18. Chen

, Zhang

, et al. A necroptosis related prognostic model of pancreatic cancer based on single cell sequencing analysis and transcriptome analysis. Front Immunol 2022;13:1022420.

19.

19. Butler

, Hoffman

, Smibert

, et al. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018;36(5):411–420.

20.

20. Warde-Farley

, Donaldson

, Comes

, et al. The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 2010;38(Web Server issue):W214–W220.

21.

21. Ganlu

, Yihong

, Cao

, et al. BMP4 promotes the metastasis of gastric cancer by inducing epithelial-mesenchymal transition via ID1. J Cell Sci 2020;133(11).

22.

22. Guohua

, Cheng

, Jiacheng

, et al. Dynamic molecular atlas of cardiac fibrosis at single-cell resolution shows CD248 in cardiac fibroblasts orchestrates interactions with immune cells. Nat Cardiovasc Res 2025;4(4).

23.

23. Aplin

, Myers

, Timms

, et al. Tracking placental development in health and disease. Nat Rev Endocrinol 2020;16(9):479–494.

24.

24. Burton

, Redman

, Roberts

, et al. Pre-eclampsia: Pathophysiology and clinical implications. BMJ 2019;366:l2381.

25.

25. Luizon

, Pinto-Souza

, Coeli-Lacchini

, et al. ARG2 single-nucleotide polymorphism rs3742879 affects plasma arginase 2 levels, nitric oxide formation and antihypertensive therapy response in preeclampsia. Pharmacogenomics 2022;23(13):713–722.

26.

26. Pinto-Souza

, Coeli-Lacchini

, Luizon

, et al. Effects of arginase genetic polymorphisms on nitric oxide formation in healthy pregnancy and in preeclampsia. Nitric Oxide 2021;109–110:20–25.

27.

27. He

, Xu

, Wang

, et al. Dysregulation of complement system during pregnancy in patients with preeclampsia: A prospective study. Mol Immunol 2020;122:69–79.

28.

28. Li

, Dong

, Xue

, et al. Increased expression levels of E-cadherin, cytokeratin 18 and 19 observed in preeclampsia were not correlated with disease severity. Placenta 2014;35(8):625–631.

29.

29. Yip

, Lyu

, Lin

, et al. Non-invasive biomarkers for liver inflammation in non-alcoholic fatty liver disease: Present and future. Clin Mol Hepatol 2023;29(Suppl):Ss171–Ss183.

30.

30. Korver

, Bowen

, Pearson

, et al. The application of cytokeratin-18 as a biomarker for drug-induced liver injury. Arch Toxicol 2021;95(11):3435–3448.

31.

31. Niu

, Yu

, Li

, et al. Arginase: An emerging and promising therapeutic target for cancer treatment. Biomed Pharmacother 2022;149:112840.

32.

32. Pandey

, Bhunia

, Oh

, et al. OxLDL triggers retrograde translocation of arginase2 in aortic endothelial cells via ROCK and mitochondrial processing peptidase. Circ Res 2014;115(4):450–459.

33.

33. Bertozzi-Matheus

, Bueno-Pereira

, Viana-Mattioli

, et al. Different profiles of circulating arginase 2 in subtypes of preeclampsia pregnant women. Clin Biochem 2021;92:25–33.

34.

34. Audia

, Brescia

, Dattilo

, et al. RANBP1 (RAN Binding Protein 1): The missing genetic piece in cancer pathophysiology and other complex diseases. Cancers (Basel) 2023;15(2):486.

35.

35. Peng

, Fu

. FYN: Emerging biological roles and potential therapeutic targets in cancer. J Transl Med 2023;21(1):84.

36.

36. Deer

, Herrock

, Campbell

, et al. The role of immune cells and mediators in preeclampsia. Nat Rev Nephrol 2023;19(4):257–270.

37.

37. Moldenhauer

, Hull

, Foyle

, et al. Immune-Metabolic interactions and t cell tolerance in pregnancy. J Immunol 2022;209(8):1426–1436.

38.

38. Soto

, Shen

, Kerschensteiner

. AMIGO1 promotes axon growth and territory matching in the retina. J Neurosci 2022;42(13):2678–2689.

39.

39. Raja

, Dumontier

, Phen

, et al. Insertion of a neomycin selection cassette in the Amigo1 locus alters gene expression in the olfactory epithelium leading to region-specific defects in olfactory receptor neuron development. Genesis 2024;62(2):e23594.

40.

40. Chakraborty

, Kahali

. Exome-wide analysis reveals role of LRP1 and additional novel loci in cognition. HGG Adv 2023;4(3):100208.

41.

41. Yuexin

, Yingzi

, Chengqian

, et al. Finding potential drug targets for pre-eclampsia using mendelian randomisation and colocalisation analysis. American Journal of Reproductive Immunology (New York, NY: 1989) 2025;93(3).

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.30 MB

1.31 MB

0.47 MB

1.19 MB

Potential Drug Targets and Causal Plasma Proteins Related to Preeclampsia Identified by Mendelian Randomization

Abstract

Background:

Method:

Results:

Conclusion:

Keywords

Introduction

Materials and Methods

Study design

Data source

Protein-related MR studies

Colocalization analysis

SMR analysis

Downloading and preprocessing of single-cell sequencing data 18

PPI network analysis

Phenome-wide MR studies

Results

Proteome-wide MR analysis identified seven circulating plasma proteins associated with PE

Colocalization analysis supports the causal relationship between two proteins and PE

The SMR and HEIDI tests validated six pathogenic proteins

Research on cell type-specific expression in PE tissues

PPI network

MR analysis of GWASs on the identified PE drug target proteins and other diseases

Discussion

Conclusions

Author Disclosure Statement

Footnotes

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Abbreviations Used

References

Supplementary Material

Downloading and preprocessing of single-cell sequencing data¹⁸