Abstract
Prostate cancer is the most frequently diagnosed cancer in men in the United States. The current main methods for diagnosing prostate cancer include prostate-specific antigen test and transrectal biopsy. Prostate-specific antigen screening has been criticized for overdiagnosis and unnecessary treatment, and transrectal biopsy is an invasive procedure with low sensitivity for diagnosis. We provided a quantitative tool using supervised learning with multiparametric imaging to be able to accurately detect cancer foci and its aggressiveness. A total of 223 specimens from patients who received magnetic resonance imaging (MRI) and magnetic resonance spectroscopy imaging prior to the surgery were studied. Multiparametric imaging included extracting T2-map, apparent diffusion coefficient (ADC) using diffusion-weighted MRI,
Background
Prostate cancer is the most frequently diagnosed cancer in men. The American Cancer Society estimated approximately 161 360 new cases of prostate cancer and about 26 730 deaths from prostate cancer in the United States in 2017. 1 Currently, the 2 main methods for diagnosing prostate cancer are prostate-specific antigen (PSA) test in conjunction with a digital rectal examination and transrectal biopsy. Prostate-specific antigen, which is measured by an immunoassay, has gained wide acceptance and approved by Food and Drug Administration as a serum tumor diagnostic marker in the management of prostate cancer. 2 However, recent studies have shown that some men with low PSA levels (<4.0 ng/mL) have prostate cancer and many men with high PSA levels do not have prostate cancer. 3 In addition, it has been shown that there is little to no reduction in prostate cancer–specific mortality resulting from PSA screening, and PSA screening may be responsible for overdiagnosis and unnecessary treatment. 4 The conflicting evidence on the benefit of PSA makes it an unreliable method for prostate cancer diagnosis.5,6 The other commonly used diagnostic method, transrectal ultrasound (TRUS)-guided biopsy, uses a 12-core sampling of the prostate gland. It can result in cancers being missed if regions were not sampled. 7 Even when the biopsy does detect cancer, the localization of tumor within the gland remains imprecise. 8 Due to the imprecise nature and low sensitivity of the biopsy procedure, patients may need to undergo repeated biopsies or convert to MRI/US fusion or even other types of biopsies. 9 This may lead to either a delayed detection of aggressive cancer or unnecessary recurrent invasive biopsies in the absence of conclusive results. 10
Recently, multiparametric magnetic resonance (MR) imaging, which combines various functional MRI techniques with conventional T2-weighted imaging, has been established as a method for detection of prostate cancer.11,12 The functional imaging techniques include diffusion-weighted imaging (DWI), dynamic contrast-enhanced MRI (DCE-MRI) and magnetic resonance spectroscopy imaging (MRSI). Apparent diffusion coefficient (ADC) values from DWI have been used to differentiate prostate tumors from normal tissue as the magnitude of diffusion of the prostate tumors is lower than the normal gland. 13 Several studies have shown that ADC values are associated with patients’ Gleason scores (GSs).14–16 The DCE-MRI has also been used to differentiate malignant from normal tissues for the prostate gland. 17 And, MRSI aims to detect alterations in cellular metabolism that occur in prostate cancer. 18
It is known that using conventional T2-weighted imaging alone cannot identify the tumors within the prostate accurately. 19 To overcome this, DCE-MRI was combined with DWI to differentiate central gland cancer from benign prostatic hyperplasia. 20 The DWI, DCE-MRI, and MRSI were incorporated to predict prostate cancer aggressiveness. 21 One group combined T2-weighted imaging, DWI, DCE-MRI, 22 and another group combined T2-weighted MRI and ADC MRI 23 for prostate cancer detection.
Although combining several data sources can improve the quality of prediction, extracting complex relationship from multiple sources can be challenging. Advanced predictive models are required in addition to quality imaging sources. Machine learning methods, such as logistic regression, have been proposed to identify prostate cancer. 24 However, the challenge is class imbalance, namely, the number of instances of one class (eg, indolent disease samples) far exceeds the other class (eg, highly aggressive cancer samples). If a classifier is created without considering class imbalance, the result could be biased toward the majority class. Several methods have been proposed to deal with class imbalance problem. 25 These methods can be categorized into 2 groups: cost sampling methods and data-level approaches. 26 The cost sampling methods use an asymmetric cost function to artificially balance the training process. 27 However, the data-level approaches turn the imbalanced problem into a balanced one by either oversampling the minority class (replicating minority class observations or creating synthetic data)28,29 or undersampling (removing observations from the majority class).30,31
For the cost sampling approaches, the performance of the model heavily relies on the cost parameters and the parameters are not known a priori. And if the correlation between the predictor and output variable is weak, which we have identified is the case for the multiparametric MRI/MRSI data and the GS, using oversampling has a negative effect on the predictive model. Hence, in this study, we used an undersampling approach to systematically deal with class imbalance and developed a noninvasive tool using multiparametric imaging data in supervised machine learning methods.
Methods
Patient cohort and specimen octants generation
Data were collected from 11 patients who had TRUS-guided biopsy-proven prostate cancer and elected to have radical prostatectomy received MRI/MRSI prior to their surgical procedure. The average PSA level of these patients was 9.4 (0.5-29.0) ng/mL. After radical prostatectomy, each prostate specimen was fixed in formalin and high-resolution MR images were obtained prior to whole mount sectioning of the prostate. Axial sections (3 mm) from the specimen were made using an in-house prostate slicer. Hematoxylin-eosin (H&E) staining was performed on 50-µm sections from each of the slides. Digital images of both the slice specimens and the pathologic slides were obtained, which were used to match to the MR images. After discarding unusable slices, the remaining 28 slices were subdivided into octants. This resulted in 223 octants (1 octant was not usable). A GS was given to each of the octant by a pathologist. In our data set, GSs range from 0 to 8, with 0 indicating no cancer cell identified, GS ⩽ 6 indicating indolent (slow-growing or nonaggressive), and GS > 6 indicating aggressive cancer. In Figure 1, we show the distribution of GS in our data set.

Gleason score histogram.
Multiparametric MRI/MRSI
The following images were acquired: (a) conventional T2-weighted (T2W) images, (b) DWI-MRI, (c) DCE-MRI, and (d) MRSI covering the entire prostate using PRESS localization to attain MR spectroscopy score. Sample images are shown in Figure 2. This particular subject shows a tumor in the peripheral zone (arrows), and while it is difficult to locate the tumor foci on the T2W images and the T2-map, it can be readily detected using ADC and

An example of multiparametric imaging of prostate: Top row: T2-weighted (T2W) image, T2-map, H&E stain (histology). Bottom row: ADC map (DWI),
From these images, we extracted 4 types of quantitative features for predictive modeling. From T2W, we use the average of

Correlation plot of the average values of the 50th percentile voxels for features (ADC,
Predictive modeling via supervised machine learning
We considered 2 binary classification problems. In the first one, we aim to distinguish aggressive prostate cancer (GS > 6) from indolent disease and absence of cancer (GS ⩽ 6). In the second classification problem, we aim to detect cancerous samples (GS > 0).
Before building a predictive model, it is critical to handle the class imbalance problem. As seen in Figure 1, the number of nonaggressive cancer samples was 187 and the number of aggressive ones was 36. The ratio was approximately 5:1. When there is an imbalanced distribution in the data set, a typical classifier would be biased toward one class because it has the goal of maximizing overall accuracy. Because there was a weak correlation between the features and the GS, as shown in Figure 3, oversampling approaches may increase the noise in data which deteriorates the quality of the predictive model. Therefore, we addressed the class imbalance with undersampling method which removes the observations from the majority class to turn the training data set into a balanced one. For the aggressive cancer prediction problem, the method eliminated observations from the class which included indolent disease and absence of cancer observations. For the cancer foci detection problem, the number of noncancerous samples was 96 and the number of cancerous samples was 127. The ratio was close to 1:1.3. Therefore, the problem is balanced. The machine learning model that was applied to extract complex relationship between the multiparametric imaging features and the GS was an ensemble method called boosting. 33 Boosting creates a highly accurate prediction model by combining multiple weak learners. Among the boosting method, we used the adaptive boosting method which is known as AdaBoost in the literature. 34 In our implementation of AdaBoost, we used decision trees as the weak learner, ie, final classifier is a combination of several decision trees with different weights. For the decision tree classifiers, we used Gini’s diversity index to decide a variable at each step that split the set of items and the minimum number of leaf node observations was set to 2.
Training set for the AdaBoost consisted of
where
An example of the AdaBoost is illustrated in Figure 4. Red and blue circles represent 2 different classes. The algorithm starts with equal weights for each observation in the training set at iteration

Weak classifier for different iterations. (A) At iteration t = 1, a weak classifier is created for D1 where each observation has the same weight. (B) At iteration t = 2, after updating the weights of the observations, a new weak classifier is obtained for D2. (C) At iteration t = 3, final weak classifier is generated which is h3.
To evaluate the performance of the model, we tested the model using cross-validation which is a general model validation technique for assessing how the prediction of a model will be generalized to an independent data set.
35
In this technique, data are separated into

Illustration of
Results
After testing the average and standard deviation of different percentiles (eg, 5th, 10th, 50th, 90th, and 95th percentiles) of the 4 imaging features, the average of the 50th percentile features performed the best. These 4 features were used to demonstrate the results. To separate aggressive prostate cancer (GS > 6) from indolent disease (GS ⩽ 6), we created models using 2 features at a time. Figure 6 shows the probability obtained from the classifiers for 6 possible combinations of the 4 imaging features. Using Figure 6A as an example, an adaptive boost model was created with ADC and

Probability from AdaBoost representing aggressive prostate cancer (red) and indolent disease (blue) using combinations of 2 imaging features. (A) ADC and T2, (B) ADC and Ktrans, (C) ADC and Spectroscore, (D) T2 and Ktrans, (E) T2 and Spectroscore, (F) Ktrans and Spectroscore.

Classifiers separating aggressive prostate cancer (red) and indolent disease (blue) from AdaBoost using combinations of 2 imaging features. A) ADC and T2, (B) ADC and Ktrans, (C) ADC and Spectroscore, (D) T2 and Ktrans, (E) T2 and Spectroscore, (F) Ktrans and Spectroscore.
Figures 8 and 9 show the quantitative results of our methods from repetitions of 10-fold cross-validations. For distinguishing aggressive prostate cancer versus indolent disease (Figure 8), the averages and corresponding 95% confidence intervals of AUC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.73 (0.72-0.74), 0.72 (0.71-0.73), 0.73 (0.71-0.75), 0.34 (0.33-0.37), and 0.93 (0.92-0.94), respectively. For cancer foci detection (Figure 9), ie, classification between the absence of cancer (GS = 0) and presence of cancer (GS > 0), the averages and corresponding 95% confidence intervals of AUC, sensitivity, specificity, NPV, and PPV were 0.68 (0.66-0.70), 0.73 (0.70-0.77), 0.62 (0.60-0.68), 0.73 (0.71-0.76), and 0.65 (0.62-0.68), respectively.

Summary of prostate cancer aggressiveness classification accuracy from 10 runs of 10-fold cross-validations showing average and 95% confidence intervals of AUC, sensitivity, specificity, PPV, and NPV.

Summary of prostate cancer foci detection accuracy from 10 runs of 10-fold cross-validations showing average and 95% confidence intervals of AUC, sensitivity, specificity, PPV, and NPV.
Discussion
The current methods for prostate cancer diagnosis include PSA testing and transrectal biopsy. However, the accuracy of PSA testing is low with sensitivity around 20% for detecting any prostate cancer and around 50% for detecting high-grade prostate cancers. 36 However, biopsy is more reliable for prostate cancer diagnosis than PSA testing, but it is an invasive method. In a recent study, the reliability of a 12-core biopsy for prostate cancer detection was evaluated. 4 For patients with <4 ng/mL, (4-10) ng/mL and >10 ng/mL PSA levels, the sensitivities were 40%, 63%, and 76%, respectively. The average sensitivity for the whole test group was 59%. We provided a noninvasive supervised learning tool using multiparametric MRI/MRSI that achieved an average sensitivity of 73% compared with PSA and biopsy.
When attempting to predict prostate cancer aggressiveness, previous studies excluded noncancerous observations (GS = 0). In this study, we included these observations while predicting the prostate cancer aggressiveness. Although this turned the classification problem difficult (as seen in Figure 6, the positive class [GS > 6] and the negative class [GS ⩽ 6] are very close to each other), it is more realistic and we were able to achieve an average AUC of 0.73 for prostate cancer aggressiveness prediction.
A potential limitation of this study is that all our data were from patients with prostate cancer and we did not have healthy prostate data as control. However, many specimens were not cancerous (Figure 1). We tested the correlation between the GS of adjacent specimens. The correlation coefficient was 0.3. The correlation coefficient for specimens that were one more slice apart was 0.004. Therefore, it was a valid assumption to treat specimens as independent observations. We plan to include healthy prostate data in the future to test our tool.
It was critical to be able to handle class imbalance when predicting prostate cancer aggressiveness. In practice, aggressive cancer only represents a small portion of the whole prostate. However, it is very important for the clinicians to be able to identify the aggressive cancer so that personalized treatment can be given. Dealing with class imbalance is still an ongoing research topic in machine learning field. And there were few studies which addressed this issue in prostate cancer prediction. In this study, the number of observations in one class (GS ⩽ 6) significantly outnumbered the other class (GS > 6) with the ratio of 5:1 (187/36). We demonstrated that our method of using undersampling in AdaBoost model was an effective way of handling class imbalance for prostate cancer aggressiveness prediction.
After prostate cancer diagnosis, many types of treatments are available including radiotherapy, endocrine therapy, surgery, etc. For men diagnosed with aggressive cancer, the goal is to keep the disease from spreading. Physicians can treat these patients with localized therapies such as surgery and radiotherapy. And systemic treatments, such as hormonal therapy, can also be used for these patients. A recent study shows that a mix of different treatments improves survival of patients with Gleason 9 and 10. 37 If aggressive prostate cancer can be identified early using the tools provided in this work, these types of treatment can be considered by physician.
Conclusions
Our results on both cancer foci detection and aggressiveness classification problems showed that using multiparametric MRI/MRSI with machine learning method could provide clinicians a more accurate predictive tool for prostate cancer assessment. Adaptive boosting with random undersampling could accurately identify highly aggressive prostate cancer. This noninvasive method will allow for nonsubjective disease characterization, which provides physician information to make personalized treatment decisions.
Footnotes
Funding:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported in part by US Department of Defense through the grant number W81XWH-04-1-0249.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
GK built the supervised machine learning tool and led the analysis of the prediction results and the writing of the manuscript. RG and WD interpreted the clinical significance of the results. GMDI analyzed the results and generated the figures. MN, JW, and NM processed the MRI images and collected the T2W, DWI-MRI and MRSI data. JP provided the histology on the data and the pathology interpretation. SR performed the octant work and collected the DCE-MRI data. RG, WD, and HZ conceived and designed the overall study. HZ designed the predictive modeling framework, supervised GK in developing the supervised machine learning tool, provided intellectual input, and participated in analyzing and interpreting the results. All authors were involved in either drafting or revising the manuscript and all authors approved the final manuscript.
