PRADclass: Hybrid Gleason Grade-Informed Computational Strategy Identifies Consensus Biomarker Features Predictive of Aggressive Prostate Adenocarcinoma

Abstract

Background

Prostate adenocarcinoma (PRAD) is a common cancer diagnosis among men globally, yet large gaps in our knowledge persist with respect to the molecular bases of its progression and aggression. It is mostly indolent and slow-growing, but aggressive prostate cancers need to be recognized early for optimising treatment, with a view to reducing mortality.

Methods

Based on TCGA transcriptomic data pertaining to PRAD and the associated clinical metadata, we determined the sample Gleason grade, and used it to execute: (i) Gleason-grade wise linear modeling, followed by five contrasts against controls and ten contrasts between grades; and (ii) Gleason-grade wise network modeling via weighted gene correlation network analysis (WGCNA). Candidate biomarkers were obtained from the above analysis and the consensus found. The consensus biomarkers were used as the feature space to train ML models for classifying a sample as benign, indolent or aggressive.

Results

The statistical modeling yielded 77 Gleason grade-salient genes while the WGCNA algorithm yielded 1003 trait-specific key genes in grade-wise significant modules. Consensus analysis of the two approaches identified two genes in Grade-1 (SLC43A1 and PHGR1), 26 genes in Grade-4 (including LOC100128675, PPP1R3C, NECAB1, UBXN10, SERPINA5, CLU, RASL12, DGKG, FHL1, NCAM1, and CEND1), and seven genes in Grade-5 (CBX2, DPYS, FAM72B, SHCBP1, TMEM132A, TPX2, UBE2C). A RandomForest model trained and optimized on these 35 biomarkers for the ternary classification problem yielded a balanced accuracy ∼ 86% on external validation.

Conclusions

The consensus of multiple parallel computational strategies has unmasked candidate Gleason grade-specific biomarkers. PRADclass, a validated AI model featurizing these biomarkers achieved good performance, and could be trialed to predict the differentiation of prostate cancers. PRADclass is available for academic use at: https://apalania.shinyapps.io/pradclass (online) and https://github.com/apalania/pradclass (command-line interface).

Keywords

prostate adenocarcinoma Gleason grading cancer aggressiveness WGCNA-based reconstruction grade-salient gene trait-specific key gene consensus biomarker machine learning cancer differentiation risk stratification

Introduction

Global cancer incidence over the past decade has increased by 33%, affecting one-sixth of the aging population. In 2020 alone, 19.3 million new cancer cases and 10.0 million cancer deaths were reported.¹ Prostate adenocarcinoma (PRAD), more commonly prostate cancer, is the most common cancer diagnosis among men, and aggressive prostate cancers pose a challenge to men's health.² The rise in prostate cancer cases may be attributed to changes in lifestyle, diet, and environment³ as well as over-diagnosis. Early-stage prostate cancer is largely asymptomatic and progresses slowly, and often has an indolent course that might take more than 20 years to impact the quality of life.⁴ Malignant prostate cancers might necessitate prostatectomy, with a lifetime of treatment-related disabilities.^4,5 However, over-treating prostate cancers that are indolent and unlikely to develop into a life-threatening disease could cause significant avoidable side-effects to the patient. Therefore the aggressiveness of prostate cancer is a crucial factor in the course of the disease, and informs the appropriate treatment regimen.⁶ The detection and surveillance of prostate cancer in early stages represents a prudent strategy for management of prostate cancers.⁷

Prostate tumors are histologically and clinically heterogeneous,⁸ and tissue biopsy is necessary for confirmation of diagnosis.⁹ Measurement of prostate-specific antigen (PSA) levels followed by Digital Rectal Examination (DRE) are conventionally used to screen prostate cancer, however this method is prone to overdiagnosis and does not stratify cancer as slow-growing (indolent) or aggressive (high-risk). The over-reliance on PSA and DRE leads to an increase in biopsies, which carry a significant infection risk.¹⁰ To differentiate prostate cancers and reduce the inherent difficulties in manual histopathology, automated deep learning systems have been steadily evolving to stratify prostate cancer according to the Gleason grade.¹¹ Such methods work with tissue microarray or whole slide images of biopsies, and have been advanced as potential clinical aid to support reproducible pathologist grading.^12-19 Omics approaches could be useful in studying and characterizing the progression of prostate cancer and its conversion to an aggressive form.^20,21 Expression profiling (or transcriptomics) is useful to gain an understanding of the genetic factors of prostate cancer pathology.²² Epigenetic processes also contribute to the development of prostate cancer. DNA hypermethylation of CpG-rich gene promoter regions is widespread in prostate neoplasia.²³ Transcriptomics and mutational status-based genomics of prostate tumors might enable the development of personalized therapeutics.²⁴ Understanding and identifying Gleason grade-specific gene expression might be necessary for achieving precise personalized management of prostate cancers. Following these cues, we have designed a computational study protocol for mining markers that could stratify Gleason risk of prostate tumors based on TCGA expression profiles.^25,26 We have used these biomarkers to develop machine learning models that screen prostate cancer and aid the differential diagnosis of indolent prostate cancers (Gleason grades I, II, and III) versus aggressive prostate cancers (Gleason grades IV and V).

Methods

Figure 1 summarizes the study design consisting of two major branches, the statistical –omics analysis and the network-based WGCNA analysis, which were merged to yield consensus biomarkers of Gleason grade-wise PRAD progression. These were used as features to develop machine learning models.

Figure 1.

PRADclass development. Determination of consensus biomarkers between statistical modeling and WGCNA analysis of the PRAD transcriptome provides the feature space for developing machine learning models of prostate cancer differentiation.

Data Acquisition and pre-Processing

RSEM-normalised PRAD RNASeq gene expression data from the TCGA was obtained from the firebrowse portal (gdac.broadinstitute.org_PRAD.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz).²⁷ Associated clinical data was retrieved from the same portal (PRAD.Merge_Clinical. Level_1.2016012800.0.0tar.gz). Three Gleason grade-related clinical attributes were of interest:

Primary Gleason pattern encoded as patient.stage_event.gleason_grading.primary_pattern

Secondary Gleason pattern encoded as patient.stage_event.gleason_grading.secondary_pattern

Gleason score (sum of the primary and secondary patterns) encoded as patient.stage_event.gleason_grading.gleason_score

Information about the primary and secondary patterns is necessary to disambiguate a Gleason score of, say, 7, which could arise from either 3 + 4 or 4 + 3, with the latter having a worse prognosis according to ISUP guidelines.²⁸ Using these clinical attributes, the sample Gleason grade was annotated (Table 1). The RNA-Seq data was matched with the clinical data using the sample “patient_bcr” information. The genes with nominal variation in expression across samples (σ < 1) were removed. Samples with missing grade annotation (or “NA”) were also removed. All data pre-processing was done with R.²⁹

Table 1.

Grading of Prostate Cancer Samples, Based on ISUP Guidelines.

Gleason Pattern		Gleason Score	Determined Gleason Grade
Primary	Secondary	Gleason Score	Determined Gleason Grade
3	3	6	1
3	4	7	2
4	3	7	3
3 or 4 or 5	5 or 4 or 3, respectively	8	4
4 or 5	4 or 5	9 or 10	5

Linear Modeling with PRAD Grade

A grade-wise linear model of gene expression was constructed using the R limma package³⁰:

y = α + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + β_{4} X_{4} + β_{5} X_{5}

(1)

where the intercept α represents the baseline expression from control samples, and the β_i signify grade-wise log-fold change (lfc) in expression relative to control samples. Empirical Bayes adjustment³¹ and correction for multiple hypothesis testing³² were carried out to yield a grade-wise account of differentially expressed genes ranked by significance. This model facilitated the contrast between each grade and controls (Table 2), and genes with |lfc| < 2 in any grade were eliminated.

Table 2.

Contrast Matrix with Control.

Annotation	Grade 1 - Control	Grade 2 - Control	Grade 3 - Control	Grade 4 - Control	Grade 5 - Control
Control	−1	−1	−1	−1	−1
Grade 1	1	0	0	0	0
Grade 2	0	1	0	0	0
Grade 3	0	0	1	0	0
Grade 4	0	0	0	1	0
Grade 5	0	0	0	0	1

Mean expression of samples in each grade are contrasted with the mean expression in control samples, to obtain grade-specific log-fold changes and their significance.

Pairwise Contrasts Modeling

To identify grade-salient genes, we upgraded the model of gene expression given by eqn. (1):

y = β_{0} x_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3} + β_{4} x_{4} + β_{5} x_{5}

(2)

where the x_i are indicator variables encoding the sample grade, and β_i correspond to estimated weights for each grade.³³ The design matrix corresponding to these contrasts is shown in Table 3.

Table 3.

Contrast Matrix for Executing Between-Grade Contrasts Using the ContrastsFit Function in Limma.

_{Derived Annotation}	_{Between−Grades Contrast}
_{Derived Annotation}	_{Grade1 − Grade 2}	_{Grade1 − Grade 3}	_{Grade1 − Grade 4}	_{Grade1 − Grade 5}	_{Grade 2 − Grade 3}	_{Grade 2 − Grade 4}	_{Grade 2 − Grade 5}	_{Grade 3 − Grade 4}	_{Grade 3 − Grade 5}	_{Grade 4 − Grade 5}
_{Grade 1}	₁	₁	₁	₁	₀	₀	₀	₀	₀	₀
_{Grade 2}	₋₁	₀	₀	₀	₁	₁	₁	₀	₀	₀
_{Grade 3}	₀	₋₁	₀	₀	₋₁	₀	₀	₁	₁	₀
_{Grade 4}	₀	₀	₋₁	₀	₀	₋₁	₀	₋₁	₀	₁
_{Grade 5}	₀	₀	₀	₋₁	₀	₀	₋₁	₀	₋₁	₋₁

Five grades of cancer progression implied ⁵C₂ possible pairwise contrasts, four for each grade.

Detection of Grade-Salient Genes

Grade-salient genes were flagged using a sequence of five filters:

(i) A stringent adj. p-value (< 0.001) of the contrast against controls was applied to filter the significant differentially expressed genes. Such genes were assigned to the grade showing maximum |lfc|, producing grade-specific genes.

(ii) - (v) Significance of contrast with reference to other grades (adj. P-value < 0.05) was applied to the grade-specific genes, and those that pass all the four relevant contrasts (out of the ten possible pairwise contrasts) were identified as grade-salient genes.

Monotonic Expression

To ascertain an ordinal trend in mean expression across grades, we used a model with the PRAD grade as numeric variable X:

Y = a X + b

(3)

To identify genes whose expression is in step (monotonic) with PRAD progression, we checked for the following two patterns with respect to mean gene expression:

control < I < II < III < IV < V; signifying monotonic upregulation

control > I > II > III > IV > V; signifying monotonic downregulation

Gleason Biomarker Identification Using WGCNA

The TCGA PRAD RNAseq data was normalized using voom and checked for outliers.³⁴ Genes were ranked based on the median absolute deviation in expression across the samples, and the top 5000 genes were identified. The expression subset corresponding to these genes was annotated with the sample Gleason grade, and used to construct Gleason-grade specific co-expression networks using WGCNA,³⁵ yielding five networks. For each such network, the smallest scale-free exponent with goodness-of-fit R² > 0.9 was identified as the β value using WGCNA's “pickSoftThreshold.” With β value fixed, we used the Pearson's ρ to calculate the similarity matrix corresponding to each network, which was then converted into the adjacency matrix, topological overlap matrix (TOM) and finally dissimilarity matrix (given by 1-TOM). Hierarchical clustering was applied to the dissimilarity matrix, and a dynamic tree cut algorithm with a cut height of 0.25 was used to partition the gene space into 20 co-expression modules.

The association between each module and the clinical trait of interest (namely, the Gleason grade) was analyzed to yield the following:

gene significance (GS), defined as the Pearson's ρ between gene expression and the trait of interest, i.e, Gleason grade (in other words, cor(gene_i, Trait));

module membership (MM), defined as the degree of association of a gene in a module with all the other genes in that module, ie, the degree of connectivity of the gene within the module;

module significance (MS), defined as the mean of the unsigned GS of all genes in a given module; and

module eigengene (ME), defined as the first principal component of the module-specific gene expression matrix.

Among all the modules for a given grade, the module with the largest MS value was designated as the Significant Module of that grade. Among all the genes of a given Significant Module, those with module membership MM > 0.7 and unsigned GS > 0.5 were identified as trait-specific key genes. The trait-specific key genes were ranked by unsigned GS.

Integrative Analysis

The intersection of the grade-salient genes from –omics analysis and the trait-specific key genes from WGCNA was designated as the Consensus Genes. KEGG³⁶ and Gene Ontology were used as reference databases for analyzing functional enrichment.³⁷ The grade-salient DEGs were visualized using heatmap dendrograms, and the dispersion of expression in cancer samples was observed using volcano plots (-log₁₀ transformed p-value vs log fold-change). The novelty of the identified grade-salient genes was ascertained against the Cancer Gene Census v84,³⁸ Network of Cancer Genes,³⁹ and the Clinical Trial Registry (www.clinicaltrials.gov). The top ten trait-specific key genes were used to reconstruct networks using STRINGdb.⁴⁰

Machine Learning for Modeling High-Risk Aggressive Prostate Cancers

Gleason grade-specific biomarkers might make for good features in screening benign hyperplasia from cancerous samples, and differentiating between indolent and aggressive prostate cancers. To address this problem, a ternary outcome vector was encoded, labeling normal samples, indolent PRAD samples (grades 1, 2, and 3) and aggressive PRAD samples (grades 4 and 5). The consensus genes from integrative analysis were used as the feature space for training Random Forest,^41,42 SVM,⁴³ neural networks and XGBoost models.⁴⁴ Ten-fold cross-validation (was used to optimize model hyperparameters. Post hyperparameter-optimization, the classifiers were evaluated using average performance on all the folds, and the best-performing model identified. This model was rebuilt using the full dataset, and validated on external datasets. The validated model, called PRADclass, was deployed on-line as an R Shiny app⁴⁵ and also as a command-line interface.

Results

TCGA PRAD samples had been collected from 32 tissue source sites, and processed at either the International Genomics Consortium or the Nationwide Children's Hospital Biospecimen Core Resource. All the samples had been profiled at the University of North Carolina Genome Characterization Center, using IlluminaHiSeq_RNASeqV2. The original gene expression matrix consisted of 20,531 genes×551 samples. After preprocessing we obtained a final dataset of 18,327 genes×538 samples, with 52 controls and 50, 149, 95, 70 and 123 samples in ISUP-defined I, II, III, IV and V Gleason-grades, respectively. This dataset is available in Supplementary File S1.

The dataset was log₂-transformed and observation-weighted using the limma voom function. The linear model (eqn. 1) yielded 15257 significant genes (adj. P < 0.05). A more stringent significance (adj. P < 1E-05) yielded a total of 7965 significant genes. The top ten DE genes of linear modeling, ranked by adj. P-value, are all upregulated (Table 4) and the detailed results for all genes are provided in Supplementary File S2. Figure 2 presents the grade-wise distribution of expression values of the top ten DE genes of linear modeling. Principal components of the expression patterns of the top 100 genes were calculated and found to be effective in segregating cancer samples from control samples (Figure 3). On the contrary, the principal components of the expression patterns of a random subset of 100 genes failed to demonstrate such a separation. Graphical summaries of the grade-wise distribution of expression values of the top 200 linear model genes are provided in Supplementary File S3.

Figure 2.

Distribution of expression values of the top ten linear model genes. The trend in gene expression could be either upregulation or downregulation in cancer relative to control, however all the top ten genes shown here are upregulated.

Figure 3.

Principal components analysis of cancer versus control. (A) The first two principal components of 100 randomly chosen genes failed to show a distinct clustering of the control versus cancer samples. (B) The first two principal components of the top 100 genes from linear modeling demonstrated an independent clustering of the control samples (red) and cancer samples (colored by grade). It is also seen that the control samples are tightly localized in this map, whereas the cancer samples tend to diffuse and spread out, reflecting heterogeneity in individual manifestations of prostate cancer.

Table 4.

Top ten Genes of the Linear Model, Ranked by Significance

GENE	Log Fold-Change					Adj. P-val	Regulation Status
GENE	Grade 1 β₁	Grade 2 β₂	Grade 3 β₃	Grade 4 β₄	Grade 5 β₅	Adj. P-val	Regulation Status
HOXC6	3.30	3.60	4.32	4.13	4.33	1.25E-52	UP
SIM2	3.06	3.06	2.99	3.22	2.96	4.16E-51	UP
EPHA10	2.05	2.26	2.35	2.59	2.34	1.62E-50	UP
HPN	2.78	2.83	3.14	3.05	2.58	7.29E-49	UP
PYCR1	1.47	1.53	1.68	1.70	1.59	3.37E-47	UP
SLC19A1	1.72	1.64	1.66	1.62	1.48	2.47E-46	UP
TROAP	1.27	1.49	1.90	2.77	2.94	2.98E-46	UP
LRFN1	1.62	1.56	1.69	1.80	1.79	2.98E-46	UP
EZH2	1.39	1.44	1.60	2.10	2.11	1.04E-45	UP
CGREF1	2.35	2.34	2.59	2.51	2.34	7.61E-44	UP

The lfc of mean gene expression in each grade relative to control samples is given, followed by adj. P, and inferred status of regulation in tumor.

In order to identify grade-specific genes, the significant DE genes were partitioned into different sets, shown as an Upset plot⁴⁶ in Figure 4. We then applied the filtering conditions necessary for salience, and obtained ten grade I-salient, two grade II-salient, one grade III-salient, 34 grade IV-salient and 30 grade V-salient genes. The top 10 grade-salient genes in each grade are shown in Table 5: The full set of 77 grade-salient genes is documented in Supplementary File S4. A heatmap of the grade-salient genes is shown in Figure 5, and revealed a mostly systematic trend in their expression relative to the controls, with a representation of both up- and down-regulation processes. A dendogram of the grade-annotated samples yielded a binary separation between high-grade and low-grade cancers. A dendrogram of the grade-salient genes reaffirmed this trend, showing a co-clustering among genes specific to indolent cancers (grades 1, 2, and 3) and a co-clustering among genes specific to aggressive cancers (grades 4 and 5). A volcano plot of the differentially expressed genes is shown in Figure 6.

Figure 4.

Gleason-grade distributions of the computationally identified grade-salient biomarkers, sorted by membership size. An enrichment in the advanced grades is observable.

Figure 5.

Heatmap of the expression of grade-salient genes, with clustering in both axes. Only the top ten grade-salient genes in each grade were used to construct the heatmap (as in Table 5). The downregulation of all the ten grade-4 salient genes is evident.

Figure 6.

Illustration of the significant DE genes using a volcano plot. The grade-salient genes are annotated.

Table 5.

The top Grade-Salient Genes, by Grade.

Rank	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5
1	PHGR1	PPM1E	PLA2G2A	UBXN10	CBX2
2	OR52R1	PPYR1		ATCAY	FAM72B
3	PCOTH			ANGPTL1	SLC7A4
4	SLC43A1			DGKG	ASPN
5	ANXA8L2			NECAB1	UBE2C
6	RNF157			ASB2	SHCBP1
7	MUC15			KIAA1644	TPX2
8	SLC46A2			LOC2824276	IL1F5
9	PTPRZ1			PCYT1B	CTHRC1
10	SERPINB5			RNF175	CD38

The genes are ordered by the adj. p-values of the linear model.

The top 200 genes from the numeric model (eqn. 3) are provided in Supplementary File S5. We obtained 307 monotonically expressed genes, with 112 significantly monotonic-upregulated and 126 significantly monotonic-downregulated genes. The MEGs are ranked by significance, and provided in Supplementary File S5. The intersection of significant ME genes with grade-specific genes yielded three genes: one grade-4 specific gene SERPINA5, and two grade-5 specific genes: DPYS and NKX6-1. There were eleven significant MEGs in the top 200 genes from the numeric model, namely RCBTB2, VPS36, PLCD1, HRNBP3, NUPL2, SPAG5, SCRIB, AGL, CAMK2G, WHSC1, and ZNF706.

Gene set enrichment analysis of the grade-salient genes was performed on GO and KEGG (Supplementary File S6). A search of the grade-salient genes against the CGC did not turn up any hits, suggesting that our findings could constitute novel advances in the context of prostate cancer progression. Screening the grade-salient genes against the NCG curated database of cancer drivers and healthy drivers yielded two hits, namely ANXA8L2 (grade-1 salient), and TPX2 (grade-5 salient). TPX2 is annotated as a “putative oncogene,” which agrees with the upregulation observed here. Searching the grade-salient genes in ClinicalTrials.gov, we found that eight of the grade-salient genes were used as either targets or endpoints of clinical trials (Supplementary File S7). Three among these eight (namely MUC15, UBE2C, and CD38) were targeted in prostate cancer-related clinical trials, lending weight to our findings. The top 200 genes from linear model (eqn. 1) genes were searched against the Human Protein Atlas using: “cancer related genes” or “found as prognostic markers in other cancers,” yielding the following: VAV2, BRAF, EZH2, MEN1, BUB1B, and MNX1. BRAF and EZH2 were recently shown to promote castration-resistant prostate cancers,^47,48 concordant with the upregulation noticed in cancer samples here. VAV2 has been tied to poor prognosis in prostate cancer, corroborated by its oncogenic role observed here.⁴⁹ MEN1 has been noted as a tumor suppressor gene with protective effects in aggressive tumors, via increased gene expression in prostate cancer cells, thanks to inclusion in a high copy number gain region,⁵⁰ which is corroborated by the upregulation in its expression noted here. Oncogenic BUB1B was necessary for fast proliferation of tumor cells,⁵¹ while oncogenic MNX1 has been associated with aggressive disease,⁵² both results supported by the upregulation found here. The monotonic grade-salient genes in the advanced aggressive stages of PRAD, namely DPYS, NKX6-1, and SERPINA5 are documented in the prostate cancer literature. Aberrant hypermethylation of DPYS has been found to have prognostic value in stratifying early prostate cancers,⁵³ discordant with the overexpression observed here. NKX6-1, observed here as an oncogene with respect to prostate adenocarcinoma, has been documented with opposing functional roles depending on cancer cell-type: as tumor suppressor in colorectal cancer, gastric cancer, acute lymphoblastic leukemia, B-cell lymphoma; and as oncogene in hepatocellular carcinoma and breast cancer.⁵⁴ Loss of expression of SERPINA5 is correlated with high-grade prostate tumors and adverse prognosis⁵⁵ concordant with the significant downregulation observed in the cancer samples here. The overlap of the top 200 MEGs with the top 200 of the linear model yielded three genes (SPAG5, ZNF706, SERPINA5), and the overlap of the top 200 MEGs with the top 200 of ordinal model (eqn. 3) yielded eleven genes: SPAG5, ZNF706, RCBTB2, VPS36, PLCD1, HRNBP3, NUPL2, SCRIB, AGL, CAMK2G, and WHSC1. A discussion of the grade-salient genes and the distribution of their expression values in different Gleason grades are provided in Supplementary File S8.

Trait-Specific key Genes from Grade-Informed WGCNA

WGCNA has been used to analyze stage-specific cancer expression.⁵⁶ Here we used WGCNA for the analysis of Gleason grade-specific cancer expresion. Grade-wise preprocessed datasets were used to reconstruct Grade-specific weighted correlation networks using R WGCNA (Supplementary File S9). The β parameter was identified as 14, 9, 7, 3, and 4 for Gleason grade-I, grade-II, grade-III, grade-IV, and grade-V networks, respectively. This identification of the optimal scale-free network is illustrated for grade-I in Figure 7. Modular decomposition was effected by the dynamic tree cut algorithm, and the outlier genes in each grade were binned into a gray module. The GS and MM of each gene was calculated, and presented grade-wise in Supplementary File S10. Figure 8 shows the module-trait correlation of all modules in each grade as a heatmap, for which the source data is given in Supplementary File S11.

Figure 7.

Finding the soft-thresholding power in WGCNA analysis, illustrated for grade-I network. (A) Scale-free fit of degree distribution (ie, p(k) ∼ k^−γ) for various soft-thresholding powers to identify the optimal β. (B) Change in mean connectivity with β.

Figure 8.

Heatmap of grade-wise module-trait relationships, depicting correlation (with P-value) between module eigengene (me) and grade for each module in the given grade. (A) Grade-I, (B) Grade-II, (C) Grade-III, (D) Grade-IV, and (E) Grade-V. The Significant Module in each grade is highlighted. Strength of correlation is indicated with a color gradient.

Based on the per-module GS values, the Significant Module of each grade was identified. A scatter of MM versus GS for all genes in each Significant Module (Figure 9) showed a strong Pearson's correlation across all the Significant Modules. The trait-specific key genes in each Significant Module were determined, yielding 67 genes in Grade-I, 46 genes in Grade-II, 204 genes in Grade-III, 603 genes in Grade-IV, and 83 genes in Grade-V (presented in Supplementary File S12 and summarized in Table 6).

Figure 9.

Grade-wise module significance, along with the scatter of genewise correlation between MM and GS in the grade's significant module. (A) Grade-I; (B) Grade-II; (C) Grade-III; (D) Grade-IV; and (E) Grade-V.

Table 6.

Summary of Gleason Grade-Wise WGCNA Analysis of PRAD Transcriptome.

Grade	β	No. of Modules	Significant Module	# Genes in Sig. Module	Key Genes in Sig. Module	ME-Trait Correlation	Significance of Trait Corr. (P)
I	14	10	Pink	79	67	0.44	1e-05
II	9	12	Yellow	381	46	−0.58	6e-19
III	7	12	Blue	843	204	−0.63	1e-16
IV	3	12	Turquoise	1879	603	−0.72	3e-18
V	4	11	Yellow	336	83	0.68	3e-24

The top ten trait-specific key genes in the significant module in each grade were used to reconstruct driver networks using STRINGdb. The trait-specific key genes of Grade-1 WGCNA network were enriched in the catabolic processes of fatty acids and lipids (P < 1E-04). The trait-specific key genes of Grade-2 WGCNA network showed a KEGG enrichment of PPAR signaling pathway (P < 1E-2), altering androgen activity thereby driving PRAD progression.⁵⁷ The trait-specific key genes of Grade-3 WGCNA network revealed a Reactome enrichment⁵⁸ for neddylation pathway (P < 1E-04), which increases androgen receptor transcription and promotes the growth and invasion of the prostate cancer cells.⁵⁹ The trait-specific key genes of Grade-4 WGCNA network showed an enrichment in post-translational SUMOylation modifications in the pTEN/AKT and androgen-receptor signaling pathways (P < 2E-3).⁶⁰ An analysis with Reactome revealed enrichment in PPARA gene expression (P < 0.05), known to drive advanced prostate cancer.⁶¹ The top ten trait-specific key genes of Grade-5 WGCNA network were enriched in pTEN transcription regulation (P < 1E-04), whose loss of function is well-known to be associated with aggressive and metastatic prostate cancer.⁶² An analysis with Reactome showed significance for HCMV early events (P < 1E-3), known to be involved in the etiology of metastatic prostate carcinoma.⁶³ The grade-wise detailed results are presented in Supplementary File S13.

We now turn our attention to a discussion of the grade-saliend genes identified from statistical modeling.

Grade-I salient DEGs. PHGR1 (proline-, histidine-, and glycine-rich 1) encoding a protein of unknown function was found to be regulated by siRNA markers of castration-resistant prostate cancer-like cell.⁶⁴ MUC15 and PTPRZ1 genes have been documented as downregulated in PCa; specifically, MUC15 expression was negatively correlated with the Gleason score.^65,66 The aberrant overexpression of ANXA8L2, SLC43A1 (also called as LAT3), RNF157, and PCOTH is known to be involved in macrophage M2 polarization, TAF-Iβ pathway, remodeling of extracellular matrix and other pathways in prostate cancer.^67-70 The overexpression of LAT3 was also associated with poor prognosis in prostate cancer.⁷¹ Our findings revealed that these four genes are upregulated, agreeing with literature. Overexpression of SERPINB5 was associated with better prognosis in PC patients,⁷² signifying a healthy protective effect and agreeing with our findings that its downregulation is necessary for Gleason-grade progression of prostate tumors. OR52R1, an olfactory receptor, played a role in prostate cancer progression through activation of the PI3K pathway,⁷³ correctly identified as upregulated here.

Grade-II salient DEGs. PPM1E is essential for chromosome segregation during mitosis, and identified as an oncogene here. The association of PPM1E with PAK1 promotes cell division, tumor growth, and microinvasion.^74,75 Prostate cancer-specific differential expression of PPYR1, a neuropeptide, has been noted previously,⁷⁶ agreeing with the overexpression observed here.

Grade-III salient DEGs. Overexpression of PLA2G2A, Phospholipase A2 Group IIA has been identified as a prognostic biomarker of androgen-independent prostate cancer.⁷⁷ On the contrary, it was also found to be differentially downregulated in metastatic prostate cancers.⁷⁸ Our findings supported the latter study.

Grade-IV salient DEGs. UBXN10 plays a key role in ciliogenesis, whose dysregulation of which initiates tumorigenesis.^79,80 Recently, UBXN10 has been identified in an eight gene signature for predicting progression-free survival, exerting a protective effect in prostate cancer,⁸¹ which accords with the tumor suppressor activity noted here. There is ample literature evidence supporting the downregulation of ANGPTL1, RNF175, and ASB2 in prostate cancer that is found here.^82-86 Liu et al have identified KIAA1644 as a protective prognostic gene in endometrial cancer,⁸⁷ reinforced by the downregulation obtained from the analysis of the PRAD TCGA data here. Upregulation of PCYT1B has been documented in endometrial cancer,⁸⁸ suggesting an analogous oncogenic role in PCa as found in our investigations. DGKG has been identified as a tumor suppressor in colorectal cancer,⁸⁹ and its downregulation observed in our studies might point to its similar significant role in prostate cancer.

Grade-V salient DEGs. It has been recently reported that TBX2 and CBX2 were upregulated in advanced PCa, helping sustain an androgen deprivation condition through neuro-endocrine differentiation, and thereby promoting migration of PCa cells and facilitating invasion.^90-93 CTHRC1 and SHCBP1 have been documented to be significantly upregulated in PCa tissues, and moreover the latter was associated with poor survival outcomes.^89,94 Certain grade-5 salient genes like ASPN, CTHRC1, CD38 were found to be associated with metastatic PCa, with higher expression in the stroma or the tumor microenvironment, highlighting prognostic potential.^95-97 The evidence from the present work corroborated all the above studies. UBE2C was also found to be an independent prognostic factor in prostate cancer, exhibiting increasing expression with Gleason grade, justifying its identification as Grade-V salient.⁹⁸

Discussion

We have executed parallel workflows for the statistical and network-based modeling of the TCGA PRAD transcriptome, stratified by Gleason grade. The concurrence of the results from the two workflows might represent reliable findings. The following were observed:

All the grade-salient genes were located either in the significant modules or with a module highly correlated with the grade;

Grade-salient genes displaying significant upregulation with Gleason progression yielded a significant positive GS in the WGCNA analysis, except MAL;

Grade-salient genes displaying significant downregulation with Gleason progression yielded a significant negative GS in the WGCNA analysis, except four: COL10A1, NOX4, FAP, and SFRP4.

Regulation status of trait-specific key genes suggested by the sign of the GS was concordant with the inference from the statistical expression patterns with respect to controls, with one exception: LOC100128675.

All the grade-salient genes reflected a strong and significant association with the WGCNA modules in the respective grades (MM > 0.4, P < 0.05).

The grade-wise intersection between the salient genes from the statistical modeling and the trait-specific key genes from network modeling produced two genes from Grade-1 (SLC43A1, PHGR1), 26 genes from Grade-4 (including C2orf88, ANGPT1, CAV2, TMLHE, IGSF1, PPARGC1A, LGR6, PPP1R3C, FRMD6, NECAB1; Supplementary File S14) and seven genes from Grade-5 (CBX2, FAM72B, SHCBP1, TMEM132A, TPX2, DPYS and UBE2C). The regulation status for these 35 consensus genes was concordant, underscoring their potential utility. A summary of the consensus genes is presented in Table 7. The detailed results of the consensus analysis are included in Supplementary File S14. A gene-wise analysis of the trait-specific top ten key genes in each module is presented in Supplementary File S15. The main inference remained unchanged: the bulk of the trait-specific key genes were also grade-salient, with sync in inferred regulation. It is striking that most of the consensus emerged from the aggressive forms of prostate cancer (ie Gleason grades 4 and 5). We used the 33 consensus genes from the aggressive grades to reconstruct a STRINGdb network, with 50 interactors in the first shell and 10 interactors in the second shell. This yielded a significantly enriched PPI with 542 edges (P < 1.0E-16; Figure 10). An analysis with KEGG revealed significant enrichment in oncogenic pathways like NF-kappa B signaling pathway (P < 1E-3), and p53 signaling pathway (P ∼ 0.02). An analysis with Reactome showed significant enrichment in cell cycle processes. The detailed results of these analyses are presented in Supplementary File S16. The consistency in the results between the –omics analysis and WGCNA is complete, which amplifies the significance of the findings and sets the stage for modeling the character of patient samples (benign, indolent or aggressive).

Figure 10.

Network reconstructed using the 33 consensus biomarkers of the aggressive grades of prostate cancer. A giant component with a clique-like core could be seen. It is remarkable that 25 of the consensus biomarkers were isolated outliers, signifying their roles underpinning varied biological processes not immediately related to each other.

Table 7.

Grade-Wise Consensus Biomarkers from the Intersection Consensus Analysis of Grade-Salient Genes and Trait-Specific key Genes.

Gene	Grade	Significant Module	MM	MM P-Value	GS	GS P-Value	Inferred Regulation	Consensus With –omics
PHGR1	1	MEpink	0.8634	6.04E-28	0.7020	1.27E-14	UP	Yes
SLC43A1	1	MEpink	0.8964	7.57E-33	0.6973	2.25E-14	UP	Yes
LOC100128675	4	MEturquoise	0.9258	1.35E-46	0.8333	4.66E-29	UP	Yes
NECAB1	4	MEturquoise	0.9218	1.95E-45	−0.7263	5.80E-19	DOWN	Yes
UBXN10	4	MEturquoise	0.94228	3.49E-52	−0.7169	2.62E-18	DOWN	Yes
SERPINA5	4	MEturquoise	0.9176	2.80E-44	−0.6927	9.95E-17	DOWN	Yes
CLU	4	MEturquoise	0.9368	3.74E-50	−0.6868	2.29E-16	DOWN	Yes
DGKG	4	MEturquoise	0.893	1.48E-38	−0.6664	3.50E-15	DOWN	Yes
NCAM1	4	MEturquoise	0.89	6.16E-38	−0.6642	4.59E-15	DOWN	Yes
ANGPTL1	4	MEturquoise	0.91043	1.96E-42	−0.6576	1.06E-14	DOWN	Yes
SYNPO2	4	MEturquoise	0.92	6.66E-05	−0.6456	4.57E-14	DOWN	Yes
EMX2OS	4	MEturquoise	0.8555	4.47E-32	−0.6446	5.10E-14	DOWN	Yes
TMEM132A	5	MEyellow	0.8011	1.29E-38	0.6821	3.37E-24	Up	Yes
CBX2	5	MEyellow	0.7818	1.11E-35	0.6497	2.11E-21	UP	Yes
UBE2C	5	MEyellow	0.9448	6.98E-82	0.6402	1.21E-20	UP	Yes
TPX2	5	MEyellow	0.9698	5.54E-103	0.6297	7.72E-20	UP	Yes
FAM72B	5	MEyellow	0.9228	2.94E-70	0.6194	4.49E-19	UP	Yes
SHCBP1	5	MEyellow	0.8797	3.93E-55	0.5684	1.12E-15	UP	Yes

All genes have maximum membership values for the significant module in each Grade. The inferred regulation is based on the sign of the GS, which denotes the correlation between gene expression and trait class (Gleason grade) of interest. Only the top ten genes from Grade-4 (ranked by GS) are shown. Consensus with regulation inferred from the statistical modeling is also ascertained.

The expression subset of the 35 consensus genes in the available 538 samples comprised the dataset for developing ML models for screening PRAD and predicting its aggressiveness. The mapping between this feature space and the sample annotation (benign, indolent or aggressive) was modeled using various classifiers (Table 8). Hyperparameters were optimized using 10-fold cross-validation. Different kernels were explored for the SVM classifier (viz. linear, radial and polynomial), and the results for the kernel with the best performance are shown. The RandomForest model with optimal hyperparameters showed a balanced accuracy ∼ 100.00% among the three outcome classes during cross-validation. SVM with radial kernel and XGBoost were also effective models, with >90% cross-validated balanced accuracy. A feature importance analysis of the RandomForest model using R caret⁹⁹ identified the top ten features based on mean decrease in Gini score¹⁰⁰ as: SLC43A1, FAM72B, CBX2, UBE2C, SHCBP1, TMEM132A, LOC100128675, TPX2, PHGR1, and DPYS (Figure 11). The consensus grade-I salient genes (SLC43A1 and PHGR1) were key features in the RandomForest solution, as were all the seven consensus grade-V salient genes (FAM72B, CBX2, UBE2C, SHCBP1, TMEM132A, TPX2, and DPYS), validating the significant contribution of differential Gleason grade-specific biomarkers to the aggressive character of PRAD cancers. LOC100128675, known as HPN-AS1, is the only grade-IV salient gene, coding for an androgen responsive lncRNA with prognostic value in hormone-related cancers, particularly prostate cancer.¹⁰¹

Figure 11.

Importance ranking of the features used in building PRADclass. The top ten features are shown, collectively representing the greatest contribution to PRADclass. The scores are normalized with the importance of SLC43A1, a Gleason grade-I consensus biomarker. Other features represented here include consensus Gleason grade-V biomarkers (FAM72B, CBX2, UBE2C, SHCBP1, TMEM132A), and one consensus Gleason grade-IV biomarker (LOC100128675).

Table 8.

Hyperparameters and Performance Measures of Different Models Investigated for the Ternary Classification Problem of Sample as Normal (Possibly Benign Hyperplasia), Indolent Prostate Cancer or Aggressive Prostate Cancer, Based on Expression Levels of Consensus Genes.

S. No.	Classifier	Hyperparameters of Interest	Optimal Hyperparameters	CV Balanced Accuracy (%)
1	SVM (radial kernel)	cost, gamma	1, 0.1	93.87
2	RandomForest	ntree (# trees in the forest), mtry (# candidate variables randomly sampled for splitting)	500, 6	100.00
3	Neural Networks (1-layer)	size, decay	2, 1	77.94
4	Neural Networks (2-layer)	#units in hidden layer 1, #units in hidden layer 2	2, 26	89.18
5	XGBoost	eta, max depth, colsample_by tree	0.01,5,1	92.37

Performance was measured using 10-fold cross-validation balanced accuracy for the ternary classification problem.

We searched cBioPortal with the keyword “prostate,” and retrieved the DKFZ Prostate Cancer study¹⁰² with 100 indolent and 18 aggressive samples (https://www.cbioportal.org/study/summary?id=prostate_dkfz_2018). The dataset was missing expression measurements for two out of the 35 features, namely LOC100128675 and EMX2OS, which were imputed with missRanger¹⁰³ using the other predictor variables. Post-imputation, the model was evaluated on this external dataset (Table 9), yielding the following performance: (i) balanced accuracy: 0.814, (ii) AUROC: 0.83, and (iii) weighted F1-score: 0.81. We used another dataset to evaluate the model performance on normal samples, namely Suntsova et al,¹⁰⁴ which provided expression profiles of three prostate tissue samples from healthy donors (GSE120795). All the three samples were correctly predicted as normal, providing evidence for discrimination of normal samples and model generalization on out-of-domain data.

Table 9.

Composite Confusion Matrix for the Model Performance on External Datasets. the Model was Evaluated on PRAD-DKFZ and GSE120795, Yielding a Balanced Accuracy of ∼ 0.86.

Confusion Matrix		Predicted
Confusion Matrix		Aggressive	Control	Indolent
Reference	Aggressive	14	1	3
	Control	0	3	0
	Indolent	16	2	82

Encouraged by these results, we sought to further validate the consensus biomarkers. We performed a contrast analysis of gene expression in aggressive versus indolent PRAD samples in the TCGA dataset. Applying an unsigned lfc threshold ∼1.0 and adj. p value <0.05, we obtained 166 DEGs. This included six consensus biomarkers: DPYS, TPX2, UBE2C, FAM72B, CBX2, and PHGR1; as well as twelve more grade-salient genes. A similar contrast analysis was executed on the PRAD-DKFZ dataset, yielding 2010 DEGs with 13 consensus biomarkers: SLC43A1, PHGR1, CBX2, UBE2C, SHCBP1, TPX2, FAM72B, DPYS, TMEM132A, UBXN10, CAPN6, PCYT1B, and KIAA1644; as well as 19 more grade-salient genes. These results are given in Supplementary File S17. We also explored the novelty of our findings against published biomarker panels for prostate cancer management. Decipher is a 19-gene signature for treatment monitoring of prostate cancers,¹⁰⁵ and shared one consensus biomarker: UBE2C. Oncotype DX GPS 17-gene panel is used for monitoring prostate tumor aggressiveness¹⁰⁶ and shared two consensus biomarkers: SFRP4 and TPX2. Prolaris has published a 46-gene panel for monitoring prostate tumor aggressiveness,¹⁰⁷ and shared one consensus biomarker: PGC. Further panels explored included ProMark (used for typing tumor aggressiveness¹⁰⁸) and ConfirmMDx (used for predicting true negative prostate biopsies based on methylation patterns¹⁰⁹), but no consensus biomarkers were shared in these panels.

The better-performing models, viz. RandomForest, SVM (radial kernel) and XGBoost, were re-built using the full dataset. A command-line interface including the RDS binaries of these models has been provided for academic and not-for-profit use at: https://github.com/apalania//PRADclass. The hyperparameter-optimized best-performing RandomForest model, PRADclass, has been deployed as a web-server (https://apalania.shinyapps.io/pradclass/). Features from expression profiling have been earlier used to build predictive models of cancers, for eg breast cancer.¹¹⁰ PRADclass may assist in risk stratification of prostate cancers, especially in settings where availability of Gleason grading expertise is constrained. Such software as medical device solutions could act as handy alternative decision support tools,¹¹¹ but further clinical validation of PRADclass is necessary. The robustness of the biomarker panel to the heterogeneity of prostate cancers needs to be ascertained, ensuring that the biomarker assay correctly stratifies patient samples presenting aggressive tumor foci, including those not encountered during model training. Prospective cohort studies could validate and improve PRADclass for future use in real-time point-of-care settings. We note that the identified grade-wise consensus biomarkers could further suggest potential targets for therapeutic interventions. Future research may address the fine discrimination among Gleason grades to enable a more refined three-way prostate cancer prognosis (low, intermediate or high-risk).

Conclusions

Gleason grading is the gold standard for risk stratification of prostate cancer patients, and molecular biomarkers specific to different Gleason grades could enable the stratification of high-risk prostate cancers that require active management versus low-risk prostate cancers. In this study, we have applied parallel computational pipelines, based on statistical modeling and network analysis of TCGA expression data, and identified grade-wise consensus biomarkers. Statistical modeling yielded ten grade-I salient genes (PHGR1, OR52R1, PCOTH, SLC43A1, ANXA8L2, RNF157, MUC15, SLC46A2, PTPRZ1, and SERPINB5), two grade-II salient genes (PPM1E and PPYR1), one grade-III salient gene including (PLA2G2A), 34 grade-IV salient genes (including UBXN10, ATCAY, ANGPTL1, DGKG, NECAB1, ASB2, KIAA1644, LOC2824276, PCYT1B, RNF175), and 30 grade-V salient genes (including CBX2, FAM72B, SLC7A4, ASPN, UBE2C, SHCBP1, TPX2, IL1F5, CTHRC1, CD38). WGCNA modeling yielded 1003 genes. The consensus analysis yielded 35 biomarkers, viz. two genes in Grade 1 (SLC43A1, PHGR1), 26 genes in Grade 4 (LOC100128675, PPP1R3C, NECAB1, UBXN10, SERPINA5, CLU, RASL12, DGKG, FHL1, NCAM1), and seven genes in Grade 5 (CBX2, DPYS, FAM72B, SHCBP1, TMEM132A, TPX2, UBE2C). To explore the histoprognostic value of these molecular markers, we constructed various machine learning models using these candidate biomarkers as feature spaces. PRADclass, a RandomForest model, obtained near-perfect ten-fold cross-validation performance on the ternary problem of classifying a patient sample as benign, indolent prostate cancer or aggressive differentiation. Validation of PRADclass on external out-of-domain datasets yielded >85% balanced accuracy. Further work is essential to assess the clinical utility of PRADclass as a decision support aid especially for prostate cancer stratification. PRADclass is experimentally available for academic and non-commercial use at https://apalania.shinyapps.io/pradclass. The candidate biomarkers might represent novel chemotherapy target hypotheses, ripe for experimental validation.

Footnotes

Acknowledgements

We would like to thank Amrutha Karthikeyan and Shivathmika Ramanathan for assistance with the WGCNA analysis. We are grateful to the School of Chemical and Biotechnology, SASTRA Deemed University, for infrastructure and computing support.

Abbreviations

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Ethical Statement

Our study did not require an ethical board approval because it did not contain human or animal trials. Our study uses public-domain de-identified TCGA datasets for analysis, and other public-domain de-identified datasets for validation.

Availability of data and materials

Public-domain datasets were used in this study. Data generated during the study are included as supplementary information to the published article as well as deposited in an online repository at: https://doi.org/10.6084/m9.figshare.22549621.v2.

ORCID iDs

Sangeetha Muthamilselvan

Rachanaa Raja

Ashok Palaniappan

Notes

References

Sung

Ferlay

Siegel

, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209-249. doi:https://doi.org/10.3322/caac.21660

James

. Cancer: A very short introduction. Oxford University Press; 2011.

Hassanipour-Azgomi

Mohammadian-Hafshejani

Ghoncheh

Towhidi

Jamehshorani

Salehiniya

. Incidence and mortality of prostate cancer and their relationship with the human development Index worldwide. Prostate Int. 2016;4(3):118-124. doi:https://doi.org/10.1016/j.prnil.2016.07.001

Rawla

. Epidemiology of prostate cancer. World J Oncol. 2019;10(2):63-89. doi:https://doi.org/10.14740/wjon1191

Wilt

Jones

Barry

, et al. Follow-up of prostatectomy versus observation for early prostate cancer. N Engl J Med. 2017;377(2):132-142. doi:https://doi.org/10.1056/NEJMoa1615869

Lee

Mallin

Graves

, et al. Recent changes in prostate cancer screening practices and epidemiology. J Urol. 2017;198(6):1230-1240. doi:https://doi.org/10.1016/j.juro.2017.05.074

Bruinsma

Bangma

Carroll

, et al. Active surveillance for prostate cancer: A narrative review of clinical guidelines. Nat Rev Urol. 2016;13(3):151-167. doi:https://doi.org/10.1038/nrurol.2015.313

Salami

Hovelson

Kaplan

, et al. Transcriptomic heterogeneity in multifocal prostate cancer. JCI Insight. 2018;3(21):e123468. doi:https://doi.org/10.1172/jci.insight.123468

American cancer society. 2023, Oct 11. Tests to Diagnose and Stage Prostate Cancer. https://www.cancer.org/cancer/types/prostate-cancer/detection-diagnosis-staging/how-diagnosed.html.

10.

Borghesi

Ahmed

Nam

, et al. Complications after systematic, random, and image-guided prostate biopsy. Eur Urol. 2017;71(3):353-365. doi:https://doi.org/10.1016/j.eururo.2016.08.004

11.

Gleason

. Classification of prostatic carcinomas. Cancer Chemother Rep. 1966;50(3):125-128.

12.

Arvaniti

Fricker

Moret

, et al. Automated Gleason grading of prostate cancer tissue microarrays via deep learning. Sci Rep. 2018;8(1):12054. doi:https://doi.org/10.1038/s41598-018-30535-1

13.

Bulten

Pinckaers

van Boven

, et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: A diagnostic study. Lancet Oncol. 2020;21(2):233-241. doi:https://doi.org/10.1016/S1470-2045(19)30739-9

14.

Nagpal

Foote

Tan

, et al. Development and validation of a deep learning algorithm for Gleason grading of prostate cancer from biopsy specimens. JAMA Oncol. 2020;6(9):1372-1380. doi:https://doi.org/10.1001/jamaoncol.2020.2485

15.

Strom

Kartasalo

Olsson

, et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: A population-based, diagnostic study. Lancet Oncol. 2020;21(2):222-232. doi:https://doi.org/10.1016/S1470-2045(19)30738-7

16.

Tolkach

Dohmgörgen

Toma

Kristiansen

. High-accuracy prostate cancer pathology using deep learning. Nature Machine Intelligence. 2020;2(7):411-418. doi:https://doi.org/10.1038/s42256-020-0200-7

17.

Vente

Vos

Hosseinzadeh

Pluim

Veta

. Deep learning regression for prostate cancer detection and grading in Bi-parametric MRI. IEEE Trans Biomed Eng. 2021;68(2):374-383. doi:https://doi.org/10.1109/TBME.2020.2993528

18.

Mun

Paik

Shin

S-J

Kwak

T-Y

Chang

. Yet another automated Gleason grading system (YAAGGS) by weakly supervised deep learning. npj Digital Medicine. 2021;4(1):99. doi:https://doi.org/10.1038/s41746-021-00469-6

19.

Singhal

Soni

Bonthu

, et al. A deep learning system for prostate cancer diagnosis and grading in whole slide images of core needle biopsies. Sci Rep. 2022;12(1):3383. doi:https://doi.org/10.1038/s41598-022-07217-0

20.

Shukla

Zhang

Niknafs

, et al. Identification and validation of PCAT14 as prognostic biomarker in prostate cancer. Neoplasia. 2016;18(8):489-499. doi:https://doi.org/10.1016/j.neo.2016.07.001

21.

Chen

Huang

, et al. Widespread and functional RNA circularization in localized prostate cancer. Cell. 2019;176(4):831-843 e22. doi:https://doi.org/10.1016/j.cell.2019.01.025

22.

Singh

Febbo

Ross

, et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell. 2002;1(2):203-209. doi:https://doi.org/10.1016/s1535-6108(02)00030-2

23.

Pistore

Giannoni

Colangelo

, et al. DNA Methylation variations are required for epithelial-to-mesenchymal transition induced by cancer-associated fibroblasts in prostate cancer cells. Oncogene. 2017;36(40):5551-5566. doi:https://doi.org/10.1038/onc.2017.159

24.

Nevedomskaya

Baumgart

Haendler

. Recent advances in prostate cancer treatment and drug discovery. Int J Mol Sci. 2018;19(5):1359. doi:https://doi.org/10.3390/ijms19051359

25.

Chang

Creighton

Davis

, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113-1120. doi:https://doi.org/10.1038/ng.2764

26.

Cancer Genome Atlas Research Network. The molecular taxonomy of primary prostate cancer. Cell. 2015;163(4):1011-1025. doi:https://doi.org/10.1016/j.cell.2015.10.025

27.

Deng

Bragelmann

Kryukov

Saraiva-Agostinho

Perner

. FirebrowseR: an R client to the Broad Institute's Firehose Pipeline. Database (Oxford). 2017;2017:baw160. doi:https://doi.org/10.1093/database/baw160

28.

Egevad

Delahunt

Srigley

Samaratunga

. International society of urological pathology (ISUP) grading of prostate cancer – an ISUP consensus on contemporary grading. APMIS. 2016;124(6):433-435. doi:https://doi.org/10.1111/apm.12533

29.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2013. http://www.R-project.org/.

30.

Ritchie

Phipson

, et al. Limma powers differential expression analyses for RNA-Sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. doi:https://doi.org/10.1093/nar/gkv007

31.

McCarthy

Smyth

. Testing significance relative to a fold-change threshold is a TREAT. Bioinformatics. 2009;25(6):765-771. doi:https://doi.org/10.1093/bioinformatics/btp053

32.

Hochberg

Benjamini

. More powerful procedures for multiple significance testing. Stat Med. 1990;9(7):811-818. doi:https://doi.org/10.1002/sim.4780090710

33.

Sarathi

Palaniappan

. Novel significant stage-specific differentially expressed genes in hepatocellular carcinoma. BMC Cancer. 2019;19(1):663. doi:https://doi.org/10.1186/s12885-019-5838-3

34.

Law

Chen

Shi

Smyth

. Voom: Precision weights unlock linear model analysis tools for RNA-Seq read counts. Genome Biol. 2014;15(2):R29. doi:https://doi.org/10.1186/gb-2014-15-2-r29

35.

Langfelder

Horvath

. WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics. 2008;9:559. doi:https://doi.org/10.1186/1471-2105-9-559

36.

Kanehisa

Goto

. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27-30. doi:https://doi.org/10.1093/nar/28.1.27

37.

Ashburner

Ball

Blake

, et al. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25-29. doi:https://doi.org/10.1038/75556

38.

Sondka

Bamford

Cole

Ward

Dunham

Forbes

. The COSMIC cancer gene census: Describing genetic dysfunction across all human cancers. Nat Rev Cancer. 2018;18(11):696-705. doi:https://doi.org/10.1038/s41568-018-0060-1

39.

Repana

Nulsen

Dressler

, et al. The network of cancer genes (NCG): A comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens. Genome Biol. 2019;20(1):1. doi:https://doi.org/10.1186/s13059-018-1612-0

40.

Szklarczyk

Gable

Nastou

, et al. The STRING database in 2021: Customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49(D1):D605-D612. doi:https://doi.org/10.1093/nar/gkaa1074

41.

Tin Kam

. Random decision forests. 1995:278-282 vol.1.

42.

Breiman

. Random forests. Mach Learn. 2001;45(1):5-32. doi:https://doi.org/10.1023/A:1010933404324

43.

Cortes

Vapnik

. Support-vector networks. Mach Learn. 1995;20(3):273-297. doi:https://doi.org/10.1007/BF00994018

44.

Chen

Guestrin

. Xgboost: A scalable tree boosting system. 2016:785-794.

45.

Chang

WCJ

Allaire

Sievert

, et al. shiny: Web Application Framework for R. https://github.com/rstudio/shiny.

46.

Lex

Gehlenborg

Strobelt

Vuillemot

Pfister

. Upset: Visualization of intersecting sets. IEEE Trans Vis Comput Graph. 2014;20(12):1983-1992. doi:https://doi.org/10.1109/TVCG.2014.2346248

47.

Xin

. EZH2 Accompanies prostate cancer progression. Nat Cell Biol. 2021;23(9):934-936. doi:https://doi.org/10.1038/s41556-021-00744-4

48.

Steinwald

Ledet

Sartor

. Eradication of BRAF K601E mutation in metastatic castrate-resistant prostate cancer treated with cabazitaxel and carboplatin: A case report. Clin Genitourin Cancer. 2020;18(3):e312-e314. doi:https://doi.org/10.1016/j.clgc.2019.12.015

49.

Magani

Peacock

Rice

, et al. Targeting AR variant-coactivator interactions to exploit prostate cancer vulnerabilities. Mol Cancer Res. 2017;15(11):1469-1480. doi:https://doi.org/10.1158/1541-7786.MCR-17-0280

50.

Paris

Sridharan

Hittelman

, et al. An oncogenic role for the multiple endocrine neoplasia type 1 gene in prostate cancer. Prostate Cancer Prostatic Dis. 2009;12(2):184-191. doi:https://doi.org/10.1038/pcan.2008.45

51.

Tian

Wang

, et al. BUB1B Promotes proliferation of prostate cancer via transcriptional regulation of MELK. Anticancer Agents Med Chem. 2020;20(9):1140-1146. doi:https://doi.org/10.2174/1871520620666200101141934

52.

Zhang

Wang

, et al. MNX1 Is oncogenically upregulated in African-American prostate cancer. Cancer Res. 2016;76(21):6290-6298. doi:https://doi.org/10.1158/0008-5472.CAN-16-0087

53.

Vasiljevic

Ahmad

Thorat

, et al. DNA Methylation gene-based models indicating independent poor outcome in prostate cancer. BMC Cancer. 2014;14:655. doi:https://doi.org/10.1186/1471-2407-14-655

54.

Chung

Lee

Chou

Lin

Shih

. NKX6.1 represses tumorigenesis, metastasis, and chemoresistance in colorectal cancer. Int J Mol Sci. 2020;21(14):5106. doi:https://doi.org/10.3390/ijms21145106

55.

Cao

Becker

Lundwall

, et al. Expression of protein C inhibitor (PCI) in benign and malignant prostatic tissues. Prostate. 2003;57(3):196-204. doi:https://doi.org/10.1002/pros.10296

56.

Tanvir

Mondal

. Stage-Specific Co-expression Network Analysis for Cancer Biomarker Discovery. IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Korea (South), 2020;1813-1819. doi:https://doi.org/10.1109/BIBM49941.2020.9313242

57.

Olokpa

Moss

Stewart

. Crosstalk between the androgen receptor and PPAR gamma signaling pathways in the prostate. PPAR Res. 2017;2017:9456020. doi:https://doi.org/10.1155/2017/9456020

58.

Fabregat

Sidiropoulos

Viteri

, et al. Reactome pathway analysis: A high-performance in-memory approach. BMC Bioinformatics. 2017;18(1):142. doi:https://doi.org/10.1186/s12859-017-1559-2

59.

Zhou

Han

Wilder-Romans

, et al. Neddylation inactivation represses androgen receptor transcription and inhibits growth, survival and invasion of prostate cancer cells. Neoplasia. 2020;22(4):192-202. doi:https://doi.org/10.1016/j.neo.2020.02.002

60.

Chen

. Roles of ubiquitination and SUMOylation on prostate cancer: Mechanisms and clinical implications. Int J Mol Sci. 2015;16(3):4560-4580. doi:https://doi.org/10.3390/ijms16034560

61.

Wang

Welsh

Tenniswood

. 1,25-Dihydroxyvitamin D3 modulates lipid metabolism in prostate cancer cells through miRNA mediated regulation of PPARA. J Steroid Biochem Mol Biol. 2013;136:247-251. doi:https://doi.org/10.1016/j.jsbmb.2012.09.033

62.

Turnham

Bullock

Dass

Staffurth

Pearson

. The PTEN conundrum: How to target PTEN-deficient prostate cancer. Cells. 2020;9(11):2342. doi:https://doi.org/10.3390/cells9112342

63.

Blaheta

Weich

Marian

, et al. Human cytomegalovirus infection alters PC3 prostate carcinoma cell adhesion to endothelial cells and extracellular matrix. Neoplasia. 2006;8(10):807-816. doi:https://doi.org/10.1593/neo.06379

64.

Cheng

Butler

Zhou

, et al. Pre-existing castration-resistant prostate cancer–like cells in primary prostate cancer promote resistance to hormonal therapy. Eur Urol. 2022;81(5):446-455. doi:https://doi.org/10.1016/j.eururo.2021.12.039

65.

Yue

, et al. MUC15 Loss facilitates epithelial-mesenchymal transition and cancer stemness for prostate cancer metastasis through GSK3β/β-catenin signaling. Cell Signal. 2021;84:110015. doi:https://doi.org/10.1016/j.cellsig.2021.110015

66.

Diamantopoulou

Kitsou

Menashi

Courty

Katsoris

. Loss of receptor protein tyrosine phosphatase β/ζ (RPTPβ/ζ) promotes prostate cancer metastasis. J Biol Chem. 2012;287(48):40339.

67.

Labrecque

Coleman

Brown

, et al. Molecular profiling stratifies diverse phenotypes of treatment-refractory metastatic castration-resistant prostate cancer. J. Clin. Investig. 2019;129(10):4492-4505. doi:https://doi.org/10.1172/JCI128212

68.

Guan

Mao

Wang

, et al. Exosomal RNF157 mRNA from prostate cancer cells contributes to M2 macrophage polarization through destabilizing HDAC1. Front Oncol. 2022;12(10):1021270. doi:https://doi.org/10.3389/fonc.2022.1021270

69.

Anazawa

Nakagawa

Furihara

, et al. PCOTH, a novel gene overexpressed in prostate cancers, promotes prostate cancer cell growth through phosphorylation of oncoprotein TAF-Ibeta/SET. Cancer Res. 2005;65(11):4578-4586. doi:https://doi.org/10.1158/0008-5472.CAN-04-4564

70.

Abraham

Zhang

Greenberg

Zhang

. Maspin functions as tumor suppressor by increasing cell adhesion to extracellular matrix in prostate tumor cells. J Urol. 2003;169(3):1157-1161. doi:https://doi.org/10.1097/01.ju.0000040245.70349.37

71.

Rii

Sakamoto

Sugiura

, et al. Functional analysis of LAT3 in prostate cancer: Its downstream target and relationship with androgen receptor. Cancer Sci. 2021;112(9):3871-3883. doi:https://doi.org/10.1111/cas.14991

72.

Machtens

Serth

Bokemeyer

, et al. Expression of the p53 and Maspin protein in primary prostate cancer: Correlation with clinical features. Int J Cancer. 2001;95(5):337-342.

73.

Zhang

. Genomic pan-cancer classification using image-based deep learning. Comput Struct Biotechnol J. 2021;19:835-846. doi:https://doi.org/10.1016/j.csbj.2021.01.010

74.

Koh

C-G

Tan

E-J

Manser

Lim

. The p21-activated kinase PAK is negatively regulated by POPX1 and POPX2, a pair of serine/threonine phosphatases of the PP2C family. Curr Biol. 2002;12(4):317-321. doi:https://doi.org/10.1016/S0960-9822(02)00652-8

75.

Goc

Al-Azayzih

Abdalla

, et al. P21 activated kinase-1 (Pak1) promotes prostate tumor growth and microinvasion via inhibition of transforming growth factor β expression and enhanced matrix metalloproteinase 9 secretion. J Biol Chem. 2013;288(5):3025-3035. doi:https://doi.org/10.1074/jbc.M112.424770

76.

Felgueiras

Lobo

. Vânia Camilo, Isa Carneiro, Bárbara Matos, Rui Henrique, Carmen Jerónimo, Margarida Fardilha. PP1 catalytic isoforms are differentially expressed and regulated in human prostate cancer. Exp Cell Res. 2022;418(2):0014-4827. doi:https://doi.org/10.1016/j.yexcr.2022.113282

77.

Dong

Liu

Scott

, et al. Secretory phospholipase A2-IIa is involved in prostate cancer progression and may potentially serve as a biomarker for prostate cancer. Carcinogenesis. 2010;31(11):1948-1955. doi:https://doi.org/10.1093/carcin/bgq188

78.

Shah

Gagliano

Garland

, et al. Androgen receptor signaling regulates the transcriptome of prostate cancer cells by modulating global alternative splicing. Oncogene. 2020;39(39):6172-6189. doi:https://doi.org/10.1038/s41388-020-01429-2

79.

Rezvani

. UBXD Proteins: A family of proteins with diverse functions in cancer. Int J Mol Sci. 2016;17(10):1724. doi:https://doi.org/10.3390/ijms17101724

80.

Hassounah

Nagle

Saboda

Roe

Dalkin

McDermott

. Primary cilia are lost in preinvasive and invasive prostate cancer. PloS one. 2013;8(7):e68521. doi:https://doi.org/10.1371/journal.pone.0068521

81.

Cao

Tong

, et al. A novel defined risk signature based on pyroptosis-related genes can predict the prognosis of prostate cancer. BMC Med Genomics. 2022;15(1):24. doi:https://doi.org/10.1186/s12920-022-01172-5

82.

Yang

Zhang

Mao

Chang

Perez-Losada

Mao

. Distinct clinical impact and biological function of angiopoietin and angiopoietin-like proteins in human breast cancer. Cells. 2021;10(10):2590. doi:https://doi.org/10.3390/cells10102590

83.

Liu

Wang

Zhao

Liang

Huang

. Identification of potential key genes for pathogenesis and prognosis in prostate cancer by integrated analysis of gene expression profiles and the cancer genome atlas. Front Oncol. 2020;10:809. doi:https://doi.org/10.3389/fonc.2020.00809

84.

Kamdar

Isserlin

Van der Kwast

, et al. Exploring targets of TET2-mediated methylation reprogramming as potential discriminators of prostate cancer progression. Clin Epigenet. 2019;11(1):54. doi:https://doi.org/10.1186/s13148-019-0651-z

85.

Takayama

Misawa

Suzuki

, et al. TET2 Repression by androgen hormone regulates global hydroxymethylation status and prostate cancer progression. Nat Commun. 2015;6:8219. doi:https://doi.org/10.1038/ncomms9219

86.

Landsittel

Jing

, et al. Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol. 2004;22(14):2790-2799. doi:https://doi.org/10.1200/JCO.2004.05.158

87.

Liu

Lin

. Identification of potential crucial genes associated with the pathogenesis and prognosis of endometrial cancer. Front Genet. 2019;10:373. doi:https://doi.org/10.3389/fgene.2019.00373

88.

Trousil

Lee

Pinato

, et al. Alterations of choline phospholipid metabolism in endometrial cancer are caused by choline kinase alpha overexpression and a hyperactivated deacylation pathway. Cancer Res. 2014;74(23):6867-6877. doi:https://doi.org/10.1158/0008-5472.CAN-13-2409

89.

Kai

Yamamoto

Sato

, et al. Epigenetic silencing of diacylglycerol kinase gamma in colorectal cancer. Mol Carcinog. 2017;56(7):1743-1752. doi:https://doi.org/10.1002/mc.22631

90.

Patel

Dutta

Syed

, et al. TBX2 Drives neuroendocrine prostate cancer through exosome-mediated repression of miR-200c-3p. Cancers (Basel). 2021;13(19):5020. doi:https://doi.org/10.3390/cancers13195020

91.

Fang

Chen

, et al. Effect of silencing the T-box transcription factor TBX2 in prostate cancer PC3 and LNCaP cells. Mol Med Rep. 2017;16(5):6050-6058. doi:https://doi.org/10.3892/mmr.2017.7361

92.

Wang

Alpsoy

Sood

, et al. A potent, selective CBX2 chromodomain ligand and its cellular activity during prostate cancer neuroendocrine differentiation. Chembiochem : A European Journal of Chemical Biology. 2021;22(13):2335-2344. doi:https://doi.org/10.1002/cbic.202100118

93.

Clermont

Crea

Chiang

, et al. Identification of the epigenetic reader CBX2 as a potential drug target in advanced prostate cancer. Clin Epigenetics. 2016;8(1):162016.

94.

Yin

, et al. SHCBP1 Promotes tumor cell proliferation, migration, and invasion, and is associated with poor prostate cancer prognosis. J Cancer Res Clin Oncol. 2020;146(8):1953-1969. doi:https://doi.org/10.1007/s00432-020-03247-1

95.

Bonollo

Thalmann

Kruithof-de Julio

Karkampouna

. The role of cancer-associated fibroblasts in prostate cancer tumorigenesis. Cancers (Basel). 2020;12(7):1887. doi:https://doi.org/10.3390/cancers12071887

96.

Zhou

Xiong

Zhou

, et al. CTHRC1 And PD-1/PD-L1 expression predicts tumor recurrence in prostate cancer. Mol Med Rep. 2019;20(5):4244-4252. doi:https://doi.org/10.3892/mmr.2019.10690

97.

Guo

Crespo

Gurel

, et al. International SU2C PCF prostate cancer dream team. CD38 in advanced prostate cancers. Eur Urol. 2021;79(6):736-746. doi:https://doi.org/10.1016/j.eururo.2021.01.017

98.

Wang

Tang

Ren

. Identification of UBE2C as hub gene in driving prostate cancer by integrated bioinformatics analysis. PloS one. 2021;16(2):e0247827. doi:https://doi.org/10.1371/journal.pone.0247827

99.

Kuhn

. Building predictive models in R using the caret package. J Stat Softw. 2008;28(5):1-26. doi:https://doi.org/10.18637/jss.v028.i05

100.

Giorgi

Gigliarano

. The Gini concentration index: A review of the inference literature. J Econ Surv. 2017;31(4):1130-1148. doi:https://doi.org/10.1111/joes.12185

101.

Wang

, et al. Comprehensive characterization of androgen-responsive lncRNAs mediated regulatory network in hormone-related cancers. Dis Markers. 2020;2020:8884450. doi:https://doi.org/10.1155/2020/8884450

102.

Gerhauser

Favero

Risch

, et al. Molecular evolution of early-onset prostate cancer identifies molecular risk markers and clinical trajectories. Cancer Cell. 2018;34(6):996-1011.e8. doi:https://doi.org/10.1016/j.ccell.2018.10.016

103.

Stekhoven

Bühlmann

. Missforest—non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112-118. doi:https://doi.org/10.1093/bioinformatics/btr597

104.

Suntsova

Gaifullin

Allina

, et al. Atlas of RNA sequencing profiles for normal human tissues. Sci Data. 2019;6(1):36. doi:https://doi.org/10.1038/s41597-019-0043-4

105.

Ross

Feng

Ghadessi

, et al. A genomic classifier predicting metastatic disease progression in men with biochemical recurrence after prostatectomy. Prostate Cancer Prostatic Dis. 2014;17(1):64-69.

106.

Cullen

Rosner

Brand

, et al. A biopsy-based 17-gene genomic prostate score predicts recurrence after radical prostatectomy and adverse surgical pathology in a racially diverse population of men with clinically low- and intermediate-risk prostate cancer. Eur Urol. 2015;68(1):123-131.

107.

Szulkin

Whitington

Eklund

, et al. Prediction of individual genetic risk to prostate cancer using a polygenic score. Prostate. 2015;75(13):1467-1474. doi:https://doi.org/10.1002/pros.23037

108.

Blume-Jensen

Berman

Rimm

, et al. Development and clinical validation of an in situ biopsy-based multimarker assay for risk stratification in prostate cancer. Clin Cancer Res. 2015;21(11):2591-2600.

109.

Van Neste

Partin

Stewart

, et al. Risk score predicts high – grade prostate cancer in DNA – methylation positive, histopathologically negative biopsies. Prostate. 2016;76(12):1078-1087.

110.

Muthamilselvan

Palaniappan

. Brcadx: Precise identification of breast cancer from expression data using a minimal set of features. Front Bioinform. 2023;3:1103493. doi:https://doi.org/10.3389/fbinf.2023.1103493

111.

Muthamilselvan S, Ramasami Sundhar Baabu P, Palaniappan A. Microfluidics for Profiling miRNA Biomarker Panels in AI-Assisted Cancer Diagnosis and Prognosis. Technol Cancer Res Treat. 2023;22. doi:10.1177/15330338231185284