Dr. Answer AI for Prostate Cancer: Predicting Biochemical Recurrence Following Radical Prostatectomy

Abstract

Objectives:

To develop a model to predict biochemical recurrence (BCR) after radical prostatectomy (RP), using artificial intelligence (AI) techniques.

Patients and Methods:

This study collected data from 7,128 patients with prostate cancer (PCa) who received RP at 3 tertiary hospitals. After preprocessing, we used the data of 6,755 cases to generate the BCR prediction model. There were 16 input variables with BCR as the outcome variable. We used a random forest to develop the model. Several sampling techniques were used to address class imbalances.

Results:

We achieved good performance using a random forest with synthetic minority oversampling technique (SMOTE) using Tomek links, edited nearest neighbors (ENN), and random oversampling: accuracy = 96.59%, recall = 95.49%, precision = 97.66%, F1 score = 96.59%, and ROC AUC = 98.83%.

Conclusion:

We developed a BCR prediction model for RP. The Dr. Answer AI project, which was developed based on our BCR prediction model, helps physicians and patients to make treatment decisions in the clinical follow-up process as a clinical decision support system.

Keywords

artificial intelligence biochemical recurrence Doctor’s Answer prostate cancer PROMISE CLIP registry random forest radical prostatectomy

Introduction

Prostate cancer (PCa) is the most common cancer in men, and its treatment options vary widely. Among the treatment options, radical prostatectomy (RP) is the surgical removal of the prostate gland. RP may be used to treat PCa that has not spread beyond the prostate or has not spread very far. However, PCa can recur after radical prostatectomy.

Physicians determine the risk of cancer recurrence after RP based on prostate-specific antigen (PSA) level and cancer stage. Biochemical recurrence (BCR) is used when assessing the outcome of RP. BCR denotes an increase in the PSA level of patients who have received surgery and/or radiotherapy for PCa. BCR is used as an endpoint to assess treatment success.¹ That is, BCR may indicate that PCa has recurred; this is also termed PSA failure or biochemical relapse. BCR has important implications for the treatment of PCa and predicting BCR has been a topic of interest for a considerable period.^1
-3 We developed a model to predict BCR following RP, based on artificial intelligence (AI) techniques.

Recently, there has been a trend to introduce AI technology into the urology field. The application of AI has shown promise; AI has been reported to be excellent for outcome prediction and diagnosis.^4

-7 In South Korea, the PROMISE CLIP Project and the Dr. Answer AI have been ongoing since 2018.⁸ The Dr. Answer AI project is South Korea’s largest project to develop software using AI technology. The project focuses on 8 diseases: cardiocerebrovascular disease, cardiac disorder, breast cancer, colorectal cancer, PCa, dementia, epilepsy, and childhood genetic and rare diseases (http://dranswer.kr).⁹ PROMISE CLIP is project to develop intelligent software to solve medical demands related to PCa. The PROMISE CLIP registry consists of clinical, imaging, and pathology data for PCa based on multicenter EMR data.

Here, we developed a model to predict BCR following RP, using clinical data from the PROMISE CLIP project registry. We developed Dr. Answer AI software for PCa based on this prediction model.

Patients and Methods

BCR Definition

We developed a model to predict BCR after RP. BCR was defined as an increase in the blood level of PSA in patients with PCa after treatment with surgery or radiation, namely a patient whose PSA value exceeded 0.2 ng/mL at any point after RP.² This BCR definition was defined by discussion between 2 clinicians.

PCa Data for BCR Prediction Model

We collected data from 7,128 patients with PCa who received RP at 3 tertiary hospitals: Seoul St. Mary’s Hospital of the Catholic University, Samsung Medical Center, and Asan Medical Center. The participating hospitals are in Seoul and the Gyeonggi province (capital area). We utilized 6,755 cases out of 7,128 available, after excluding 369 for which relapse data were missing and 4 cases with a T-stage value of TX.

Input Variables for BCR Prediction Model

We used 16 variables to develop a model to predict BCR: age at diagnosis, BMI, marital status, education, smoking, drinking, family history of PCa, initial PSA, Gleason group, max positive core count, core ratio, neoplasm high risk malignant, extracapsular extension (ECE), seminal vesicle invasion (SVI), lymph node metastasis (LM), and T staging (Table 1). Neoplasm high risk malignant is a state neoplasm exists under epithelial tissue, not invading the stromal tissue into the basement membrane. In the Table 1, N is number of missing values and % is the missing values percentile. We filled the missing values with the mode for the categorical features and the median for the continuous feature grouped by age group and T stage.

Table 1.

Sixteen Input Variables for BCR Prediction Model.

No	Variables	Type	N	%
1	Age at diagnosis	Factor (ordinal)	5	0.001
2	BMI	Numeric (continuous)	22	0.003
3	Marital status	Factor	1,450	0.215
4	Education	Factor	2,729	0.404
5	Smoking	Factor	136	0.02
6	Drinking	Factor	123	0.018
7	Family history of prostate cancer	Factor	1,276	0.189
8	Initial PSA	Numeric (continuous)	75	0.011
9	Gleason group	Factor (ordinal)	458	0.068
10	Max positive core count	Numeric (continuous)	653	0.097
11	Core ratio	Numeric (continuous)	551	0.082
12	Neoplasm high risk malignant	Factor	175	0.026
13	Extracapsular extension	Factor	235	0.035
14	Seminal vesicle invasion	Factor	193	0.029
15	Lymph node metastasis	Factor	507	0.075
16	T staging	Factor	1,810	0.268

Random Forest for BCR Prediction Model

We used a random forest (RF) to develop the BCR prediction model. RF has the advantage of addressing small sample sizes, high dimensional feature spaces, and complex data structures in a healthcare context.¹⁰ The RF is nonparametric and interpretable, highly accurate, and efficient for diverse types of data. RF approaches have been used in a variety of healthcare fields, including PCa research.^11

-15

There existed a class imbalance problem in this study. The problem of data imbalance can be solved by 1) oversampling, 2) undersampling, and 3) combining oversampling and undersampling. We applied a random forest with diverse sampling methods. We used several undersampling methods such as One side selection (OSS), condensed nearest neighbor (CNN), edited nearest neighbors (ENN), and neighborhood cleaning rule (NCR). OSS is an undersampling technique that combines tomek links and the CNN rule. Tomek links are ambiguous points on the class boundary and are identified and removed in the majority class. The CNN method is then used to remove redundant examples from the majority class that are far from the decision boundary. Another undersampling method is ENN which is deleting the closest k data among multiple class data if not all or multiple classes. NCR is an undersampling technique that combines both the CNN rule to remove redundant examples and the ENN rule to remove noisy or ambiguous examples. Like OSS, the CSS method is applied in a 1-step manner, then the examples that are misclassified according to a k-nearest neighbor (KNN) classifier are removed, as per the ENN rule. Unlike OSS, less of the redundant examples are removed and more attention is placed on “cleaning” those examples that are retained.

We used oversampling methods such as random oversampling, adaptive synthetic sampling (ADASYN), synthetic minority oversampling technique (SMOTE), synthetic minority oversampling ENN, and synthetic minority oversampling tomek. Random Oversampling is replacing the data of a minority class over and over again. It is similar to increasing the weight. ADASYN method is a method of creating virtual fractional class data on a straight line between the fractional class data and data randomly selected from among the k-numbered fractional class data closest to the data. SMOTE is an oversampling method in which a sample of a class with a small number of data is taken and a new sample is created by adding a random value to the data. Synthetic minority oversampling ENN is a mixture of SMOTE method and ENN method. SMOTE Tomek is a mixture of SMOTE method and Tomeklink method.

Oversampling refers to a method of increasing the fractional class of data, including random sampling, ADASYN, and SMOTE. Undersampling is a method in which only a part of multiple class data is used. Methods include Tomek’s link method, ENN, and neighborhood cleaning rule. Additionally, it is possible to combine oversampling and undersampling, such as in SMOTE + ENN and SMOTE + Tomek. We applied several techniques to solve the class imbalance problem. Finally, we developed a new model that considers the influence of variables in general by using several ensemble sampling techniques: random forest with SMOTE + Tomek, ENN, and random sampling.

The data were divided into training datasets and a test dataset: 9 training datasets were matched to 1 test dataset. When developing predictive models, studies often use a dataset ratio of 8 to 2 or 7 to 3. As mentioned earlier, we used several undersampling and oversampling techniques to address class imbalance problems. In such techniques, the data used to train models contain values of matched characteristics as compared to the raw data, to improve model sensitivity. Accordingly, we considered it desirable to increase the size of the training dataset rather than increase the reliability of verification by increasing the size of the test dataset. Therefore, we used the aforementioned 9:1 ratio of training to test datasets.

Weka 3.8.3 and Python 3.7 were used as the machine learning programs. This study used the 5 Python libraries to develop the BCR prediction model: Pandas, NumPy, Sklearn, Pickle, and Imblearn. Finally, we achieved BCR prediction performance over 10-fold cross validation.

Ethics

The study was performed in accordance with the Declaration of Helsinki and was approved by 3 Institutional Review Boards: Catholic University (IRB number: KC18RNDI0509), Samsung Medical Center (IRB number: SMC201807069001), and Asan Medical Center (IRB number: 2018-0963).

Results

After data preprocessing, we finally used the data of 6,755 patients with PCa who received radical prostatectomy: 1,719 cases from Seoul St. Mary’s Hospital of the Catholic University (25%), 2,383 cases from Samsung Medical Center (35%), and 2,653 cases from Asan Medical Center (39%; Table 2). Among the 6,755 patients with PCa, 2,200 experienced BCR 200 (33%). Online Appendix A shows correlations among continous predictors resampling.

Table 2.

Basic Information.

Variable		Sample size	%
Hospital	Seoul St. Mary’s Hospital of the Catholic University	1,719	25
	Samsung Medical Center	2,383	35
	Asan Medical Center	2,653	39
BCR	Patients without BCR	4,555	67
	Patients with BCR	2,200	33
Total		6,755	100

Most patients were between 60 and 74 years of age at diagnosis (n = 5,024, 74.50%). Nearly all patients were married (n = 5,165; 97.40%), most had a high school education or lower (n = 3,604; 89.50%), 55.40% were non-smokers (n = 3,667), and 70.70% drank alcohol (n = 4,687). Most patients had no family history of PCa (n = 4,798; 87.60%). Among the patients, 36% had a Gleason score of 7 (n = 2,269), 77.30% had neoplasm high-risk malignancy (n = 5,084), 66.10% had no extracapsular extension (n = 4,312), 87.20% had no seminal vesicle invasion (n = 5,725), and 75.40% had no lymph node metastasis (n = 4,708). Stage 2 classification applied to 66.50% of patients (n = 3,286). The average BMI was 34.53, average initial PSA was 9.72, and the average maximum positive core count was 46.05. The average core ratio was 0.46 (Table 3).

Table 3.

Descriptive Statistics for 16 Input Variables.

Variable		Sample size	%	Total
Age at diagnosis	40-44	16	0.2	6,750
	45-49	54	0.8
	50-54	301	4.5
	55-59	846	12.5
	60-64	1,523	22.6
	65-69	1,856	27.5
	70-74	1,645	24.4
	75-79	495	7.3
	80-84	13	0.2
	Over 85	1	0
Marital status	Single	51	1	5,305
	Married	5,165	97.4
	Divorced	24	0.5
	Bereavement	65	1.2
Education	Uneducated	492	12.2	4,026
	Elementary school graduate	611	15.2
	Middle school graduate	1,256	31.2
	High school graduate	1,245	30.9
	University graduate and above	422	10.5
Smoking	Non-smoker	3,667	55.4	6,619
	Ex-smoker	2,303	34.8
	Smoker	649	9.8
Alcohol consumption	Drinker	4,687	70.7	6,632
Alcohol consumption	Non-drinker	1,945	29.3	6,632
Family history of PCa	No family history	4,798	87.6	5,479
	Family history with first cousin	567	10.3
	Family history with second cousin	114	2.1
Gleason group	3 + 3 = 6	1,672	26.6	6,297
	3 + 4 = 7	2,269	36
	4 + 3 = 7	1,006	16
	4 + 4 = 8	742	11.8
	Gleason sum ≥ 9	608	9.7
Neoplasm high risk malignant	No	1,496	22.7	6,580
Neoplasm high risk malignant	Yes	5,084	77.3	6,580
Extracapsular extension (ECE)	No	4,312	66.1	6,520
Extracapsular extension (ECE)	Yes	2,208	33.9	6,520
Seminal vesicle invasion (SVI)	No	5,725	87.2	6,562
Seminal vesicle invasion (SVI)	Yes	837	12.8	6,562
Lymph node metastasis (LM)	No	4,708	75.4	6,248
Lymph node metastasis (LM)	Yes	1,540	24.6	6,248
T staging	Stage 1	22	0.4	4,945
	Stage 2	3,286	66.5
	Stage 3	1,595	32.3
	Stage 4	42	0.8
BMI (Mean)	34.53			6,733
Initial PSA (Mean)	9.72			6,680
Max positive core count (Mean)	46.05			6,102
Core ratio (Mean)	0.46			6,204

Table 4 compares the performance of other algorithms with that of the final algorithm. The random forest with SMOTE + Tomek, ENN, and random sampling was chosen based on comparison with other algorithms. The random forest achieved good performance: accuracy = 96.59%, recall = 95.49%, precision = 97.66%, F1 score = 96.59%, and ROC AUC = 98.83%.

Table 4.

Algorithm Performance Results.

No	Algorithm and sampling methods	Accuracy	Recall	Precision	F1 score	ROC AUC
1	Random forest with SMOTE, one side selection	80.39	68.75	95.4	79.73	87.77
2	Random forest with SMOTE, ADASYN	76.45	70.17	80.26	76.39	84.62
3	Random forest with random oversampling, neighborhood cleaning rule	84.64	76.64	82.36	83.55	92.23
4	Random forest with SMOTEENN	95.22	94.88	95.41	94.97	96.74
5	Random forest with SMOTE Tomek, ENN, and random oversampling	96.59	95.49	97.66	96.59	98.83

Online Appendix B compares undersampling performance which shows mostly lower than 70% accuracy. It is poor performance compare to the oversampling performance.

The random forest model’s accuracy summary, after running the model 30 times with different seeds selected, shows minimum 0.9491, maximum 0.9704, and mean 0.9613 (Online Appendix C).

Software for BCR Prediction Model

We developed the Dr. Answer AI for PCa software based on the BCR prediction model for radical prostatectomy. Figure 1 shows the prediction software main screens. The SW receives the individual patient’s unique values and outputs changes before and after compared with previously entered data. The SW allows patients with PCa to recognize and prevent negative prognoses in advance, and facilitates the patient’s understanding of their current condition. The “Pathology” table summarizes the pathological results after surgery. Patients with PCa can become aware of their prognosis and risk of BCR in advance by referring to the results. The yellow bar on the right indicates the BCR prediction. The gray value on the left represents the most recent blood test value after surgery.

Figure 1.

Dr. Answer AI software for BCR prediction model.jpg.

Discussion

We developed a model to predict BCR following radical prostatectomy. The model achieved good performance (accuracy: 96.59%, AUC: 98.83%) for predicting BCR based on AI techniques. BCR represents an important clinical prognosis; previous studies have assessed treatment and target features.^16

-19 It has been reported that machine learning can achieve good performance in predicting BCR. In robot-assisted prostatectomy, the accuracy of K-nearest neighbor, random forest tree, and logistic regression were reported as 0.976, 0.953, and 0.976, respectively.¹⁷ Zhang et al (2016) indicated that an imaging-based approach using support vector machine (SVM) classification was superior at predicting PCa outcome(sensitivity = 93.3%, specificity = 91.7%, accuracy = 92.2%).²⁰ The performance of our model was significantly better than model performance reported in previous studies. These models offer a variety of individualized treatment options for the clinical follow-up process. The Dr. Answer AI software helps physicians and patients to make treatment decisions during the clinical follow-up process. The BCR prediction model following RP could be have meaningful clinical used, given its good performance.

In recent years, there has been a trend toward building AI-based predictive models in the field of urology.²¹ AI is effective for application to PCa, which is complex in diagnosis and treatment.⁴ However, although AI technology is playing a growing role in the urology field, practical implementation of AI models is limited.⁶ We developed the Dr. Answer AI software based on a random forest. It has been reported that random forests have high accuracy and efficiency when applied to diverse types of data. Random forests have been used in various aspects of prostate cancer research.^11

-15 The current software represents the response to one of the 8 diseases considered in the Dr. Answer AI project.⁹ The intention is to use the Dr. Answer AI software in hospitals from 2020, upon obtaining a software license from the Ministry of Food and Drug Safety. The Dr. Answer AI software has the advantage that it is technology that can be applied to actual medical practice.

Another advantage of the current model is that we used several techniques to overcome the class imbalance problem. We collected data from 7,128 patients with PCa following RP, and used the data of 6,755 cases; there were 2,200 patients with PCa patients who experienced BCR (33%). The problem of data imbalance is caused by the different number of datapoints in each class. There are several sampling techniques by which to address class imbalances. Most techniques overweight problems or overgeneralize certain variables in the model. The current study considered the influence of variables in general via several ensemble sampling techniques: RF with SMOTE + Tomek, ENN, and random oversampling. SMOTE + Tomek combines the sampling methods of the SMOTE and Tomek techniques. SMOTE randomly selects K-nearest neighbors of the minority class. Tomek is a data cleaning technique that defines a Tomek link as a pair neighbors with minimal Euclidian distance. SMOTE + Tomek utilizes both methods’ advantages, which consist of obtaining balanced data sets with a clear boundary between the majority and minority classes.²² ENN represents an undersampling method that discards a sample if it is misclassified followed a nearest neighbors classification. ENN addresses performance issues with nearest neighbor classification due to the presence of minority class samples. ENN performance approaches that of the Bayes classifier.²³

In contrast, random oversampling represents a method of repeatedly replacing imbalanced classes for analysis. Random oversampling constructs data sets with both an expected average and standard deviation equal to those of the original minority class data.²⁴ We were able to improve the accuracy of our model by applying several techniques to address class imbalance problems.

A further advantage of the current study is that it used the data of 6,755 patients with PCa from 3 centers. Previous studies infrequently exceeded a sample size of 1,000.⁴ We developed a model to predict BCR following radical prostatectomy using clinical data from the PROMISE CLIP registry. In multi-center studies such as this, there are differences in the data formats used by each hospital. It required effort to format each institution’s data because the data were not organized into a common data model (CDM). In addition, to ensure data security, the data of each institution were uploaded to the Naver Cloud Platform,²⁵ data were deleted after the project, and the researchers could not access individual datasets. When conducting large-scale research that involves multiple institutions, a CDM for data sharing is important. In South Korea, many hospitals are converting data into a CDM, but it takes time for multi-center research based on a CDM to become feasible. Thus, difficulties associated with multi-center research existed in this study but were addressed successfully. Accordingly, a suitable algorithm and associated software were developed.

Although the current findings are meaningful, the study was subject to some limitations. First, BCR referred to a patient whose PSA value was greater than 0.2 ng/mL at least once following RP.² However, other studies have proposed 0.4 ng/m as a suitable criterion for BCR.^1,3 That is, criteria deemed appropriate differ, depending on the researcher. These differences in criteria can change model performance. Future research could consider different definitions of BCR. Second, the data of the hospitals that participated in this project were not converted into a CDM, although the project was carried out successfully. Conversion of hospital data to the same standard and integration of the data required considerable effort. If hospital data were converted to a CDM, this would increase the efficiency of future large-scale multi-center studies.

PCa is one of the most complex cancers to diagnose and treat. Therefore, it is very important to accurately predict BCR and formulate treatment plans based on this knowledge. The BCR prediction model will be useful for the aggressive treatment of PCa. The Dr. Answer AI software helps physicians and patients to make treatment decisions during the clinical follow-up process as a clinical decision support system (CDSS).

Supplemental Material

Supplemental Material, sj-pdf-1-tct-10.1177_15330338211024660 - Dr. Answer AI for Prostate Cancer: Predicting Biochemical Recurrence Following Radical Prostatectomy

Supplemental Material, sj-pdf-1-tct-10.1177_15330338211024660 for Dr. Answer AI for Prostate Cancer: Predicting Biochemical Recurrence Following Radical Prostatectomy by Jihwan Park, Mi Jung Rho, Hyong Woo Moon, Jaewon Kim, Chanjung Lee, Dongbum Kim, Choung-Soo Kim, Seong Soo Jeon, Minyong Kang and Ji Youl Lee in Technology in Cancer Research & Treatment

Footnotes

Authors’ Note

Jihwan Park and Mi Jung Rho contributed equally to this work. The study was performed in accordance with the Declaration of Helsinki and was approved by 3 Institutional Review Boards: Catholic University (IRB number: KC18RNDI0509), Samsung Medical Center (IRB number: SMC201807069001), and Asan Medical Center (IRB number: 2018-0963).

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Authors Jaewon Kim, Changjung Lee, and Dongbum Kim are employed by LifeSemantics, which developed the Dr. Answer AI software for prostate cancer, and which owns the full rights to use the software for commercial purposes, as seen fit by the company. The co-first authors, Mi Jung Rho and Jihwan Park, are married couples. They participated in this project at the time of the research.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government (MSIT) (2018-0-00861, Intelligent SW Technology Development for Medical Data Analysis).

ORCID iD

Jihwan Park

Supplemental Material

Supplemental material for this article is available online.

References

Stephenson

Kattan

Eastham

, et al. Defining biochemical recurrence of prostate cancer after radical prostatectomy: a proposal for a standardized definition. J Clin Oncol. 2006;24(24):3973–3978.

Freedland

Sutter

Dorey

Aronson

. Defining the ideal cutpoint for determining PSA recurrence after radical prostatectomy. Urology. 2003;61(2):365–369.

Amling

Bergstralh

Blute

Slezak

Zincke

. Defining prostate specific antigen progression after radical prostatectomy: what is the most appropriate cut point? J Urol. 2001;165(4):1146–1151.

Chen

Remulla

Nguyen

, et al. Current status of artificial intelligence applications in urology and their potential to influence clinical practice. BJU Int. 2019;124(4):567–577.

Hung

Chen

Ghodoussipour

, et al. A deep-learning model using automated performance metrics and clinical features to predict urinary continence recovery after robot-assisted radical prostatectomy. BJU Int. 2019;124(3):487–495.

Checcucci

Autorino

Cacciamani

, et al. Uro-technology and SoMe Working Group of the Young Academic Urologists Working Party of the European Association of Urology. Artificial intelligence and neural networks in urology: current clinical applications. Minerva Urol Nefrol. 2020;72(1):49–57.

Hung

Chen

Che

, et al. Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J Endourol. 2018;32(5):438–444.

Park

Rho

Park

, et al. Promise clip project: a retrospective, multicenter study for prostate cancer that integrates clinical, imaging and pathology data. Appl Sci. 2019;9(15):2982.

Park

Rho

Moon

Lee

. Castration-resistant prostate cancer outcome prediction using phased long short-term memory with irregularly sampled serial data. Appl Sci. 2020;10(6):2000.

10.

Random forest for bioinformatics. In: Ensemble Machine Learning. Springer; 2012:307–323.

11.

Xiao

Chen

Gou

, et al. Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen. Asian J Androl. 2017;19(5):586–590.

12.

Litjens

Debats

Barentsz

Karssemeijer

Huisman

. Computer-aided detection of prostate cancer in MRI. IEEE Trans Med Imaging. 2014;33(5):1083–1092.

13.

Lay

Tsehay

Greer

, et al. Detection of prostate cancer in multiparametric MRI using random forest with instance weighting. J Med Imaging (Bellingham). 2017;4(2):024506.

14.

Hanson

Martin

O’Neil

, et al. The relative importance of race compared to health care and social factors in predicting prostate cancer mortality: a random forest approach. J Urol. 2019;202(6):1209–1216.

15.

Toth

Schiffmann

Hube-Magg

, et al. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin Epigenetics. 2019;11(1):148.

16.

Fernandes

Dinh

Walraven

, et al. Biochemical recurrence prediction after radiotherapy for prostate cancer with T2w magnetic resonance imaging radiomic features. Phys Imaging Radiat Oncol. 2018;7:9–15.

17.

Wong

Lam

Patterson

Shayegan

. Use of machine learning to predict early biochemical recurrence after robot-assisted prostatectomy. BJU Int. 2019;123(1):51–57.

18.

Lalonde

Ishkanian

Sykes

, et al. Tumour genomic and microenvironmental heterogeneity for integrated prediction of 5-year biochemical recurrence of prostate cancer: a retrospective cohort study. Lancet Oncol. 2014;15(13):1521–1532.

19.

Zhang

Hui

, et al. Improvement in prediction of prostate cancer prognosis with somatic mutational signatures. J Cancer. 2017;8(16):3261–3267.

20.

Zhang

Wang

, et al. An imaging-based approach predicts clinical outcomes in prostate cancer through a novel support vector machine classification. Oncotarget. 2016;7(47):78140–78151.

21.

Takeuchi

Hattori-Kato

Okuno

Iwai

Mikami

. Prediction of prostate cancer by deep learning with multilayer artificial neural network. Can Urol Assoc J. 2019;13(5):E145–E150.

22.

Zeng

Zou

Wei

Liu

Wang

. Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. Paper presented at: 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS) ; 2016.

23.

Koplowitz

Brown

. On the relation of performance to editing in nearest neighbor rules. Pattern Recognition. 1981;13(3):251–255.

24.

Zhang

. RWO-Sampling: a random walk over-sampling approach to imbalanced data classification. Inf Fusion. 2014;20:99–116.

25.

Naver. Naver Cloud Platform. 2020.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.15 MB