Abstract
Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture
Keywords
Introduction
Prostate cancer is most commonly diagnosed cancer among men and remains a second leading cause of deaths in men globally. In 2017, there will be 161, 360 cases of prostate cancer, and approximately 26, 730 men will die from prostate cancer in the United States [1]. In 2013, there were approximately 240,000 and 40,000 cases of Prostate cancer reported in USA and UK respectively, and the estimate will reach up to 1.7 million cases globally by 2030 [2]. The incidence of prostate cancer varies worldwide, with the highest rates found in the United States, Canada, and Scandinavia, and the lowest rates found in China and the rest of Asia. The risk of developing prostate cancer is related to advancing age, African American ethnicity, and a confident family history, and might be influenced by diet and other factors [3]. Detecting a Prostate cancer at early stage can help to survive nine out of 10 men for the last five years. However, early detection of Prostate cancer remains a source of controversy and uncertainty [4]. The detection of prostate cancer at an early stage is crucial to extend the likelihood of prosperous treatment. Ancient prostate cancer detection uses digital body part examinations and humor prostate specific substance levels [5]. Brachytherapy represents one of many oldest techniques of radiation therapy for prostate cancer [6]. The techniques for transperineally permanent prostate brachytherapy are relatively modern, have been developed within the past years, and the selected cohorts of this study represent some of the largest series with longer follow up [7].
Medical imaging has gained much importance with- in the last few decades, especially in analyzing different body parts [8]. Researchers developed different clinical diagnostic tools such as digital rectal examination (DRE), prostate specific antigen (PSA), transrectal ultrasound (TRUS) and biopsy tests most widely used for detecting prostate cancer irrespectively of acquiring accurate results [9]. Schröder et al. [10] employed PSA to detect the prostate cancer that reduces the death rate by 20%, the benefit was associated with overtreatment and overdiagnosis. It was observed that PSA test could not predict the cancer aggressiveness. Thus, non-aggressive and slow growing prostate cancer is frequently diagnosed in older patients [11]. Moreover, TURS guided biopsy did not detect all clinical cancers [4].
However, since the accuracy of TRUS is restricted, magnetic resonance imaging (MRI) has been planned as another tool to TRUS, because of its superior soft-tissue imaging capabilities [12, 13]. Many studies have indisputable that MRI will offer higher resolution to help in detection of smaller volumes of prostate cancer with the next accuracy than TRUS [12] and may be considered as a promising technique for prostate cancer localization. Prostate MRI has become an increasingly common adjunctive procedure in the detection of prostate cancer [14]. The development of computer power-assisted diagnosing (CADx) tools for MRI of prostate holds nice promise for improved detection and characterization of prostate cancer [15]. The texture of images identifies the looks, structure and arrangement of the parts of a thing within the image [15]. And morphological analysis of medical images is utilized in many of research and clinical studies that investigate the after effect of diseases and treatments on anatomical structure [16].
In the previous studies, the researchers extracted different features from MRI images to detect the prostate cancer. Perez et al. extracted texture features with Gabor filter, Gray-level co-occurrence matrix (GLCM), Local binary patterns (LBP), Haar transform, and Hu moments and statistical features to classify prostate cancer. A high performance with AUC values of 0.81 to 0.85 was obtained with the union of texture features from the parametric maps [17]. Han et al. proposed a new prostate detection method using multiresolution autocorrelation texture features and clinical features such as location and shape of tumor. The cancerous tissues were detected efficiently with high specificity (about 90–95%) and high sensitivity (about 92–96%) by the measurement of the number of correctly classified pixels. The support vector machine (SVM) is used to classify tissues based on texture features [18]. De Rooij et al. proposed Hybrid Morphological-Textural Model in which different texture and morphological extracted features from MRI are combined for classification and obtained improved results with respect to specificity and sensitivity [19]. Doyle et al. [20] extracted texture features, First-Order Statistics, Co-occurrence Features and Wavelet Features to perform pixel-wise Bayesian classification at each image scale to obtain corresponding likelihood scenes. They applied AdaBoost algorithm to combine the most discriminating features and found overall classification accuracy of 88%. Daliri [21] extracted scale-invariant feature transforms (SIFT) features from MRI and classify disease (Alzheimer) using SVM algorithm and got accuracy 86%. Sahrim et al. [22] performed image Analysis with a boundary description derived using Fourier descriptors to detect the presence of Alzheimer’s disease and shows how image analysis using EFDs can be used to detect diseases.
The existing techniques have some limitations as only few features extracting strategies were employed which may not properly detect the valuable information in Prostate MRIs. Moreover, most of the knowledge hidden within the MR images can be extracted using complexity based sample entropy and wavelet entropy features because of complexity of healthy subject is greater than the pathological subjects and minimized because of the degrading of structural and purposeful coupling functions. Thus, these features will offer important information to differentiate the normal and cancer subjects. In this study, different features extracting strategies such as texture, morphological, sample entropy, wavelet entropy, SIFT, and EFDs are proposed to extract the valuable information from the prostate cancer MRIs which is then used as input to the novel Machine Learning classifiers including Support Vector Machine (SVM) and its kernels, Decision Trees and Bayesian approach.
Schematic diagram of ML (machine learning) classification techniques to classify the prostate cancer with brachytherapy subjects based on various features extracting strategies.
The Fig. 1 shows the schematic diagram of proposed system. In the first step, the images are taken as input from the relevant database. In the second step, the features such as texture, morphological, scale-invariant feature transforms (SIFT), elliptic Fourier descriptors (EFDs) and Entropy base features are extracted. The extracted features (single and in different combination) are then passed as input to the machine learning (ML) classifiers such as SVM with polynomial, RBF and Gaussian kernels; Bayesian classifiers and Decision Tree classifiers. Finally, the training and target data split was made using Jackknife 10-fold cross validation to classify the Prostate and Brachytherapy subjects.
Dataset
The Dataset were taken from publicly available database provided by the Harvard University (National Center for Image Guided Therapy Department of Radiology, Brigham and Women Hospital, Harvard Medical School), and funded by National Institutes of Health available at (
Features extraction strategies
The first and foremost important step toward classification problems is to extract and select the relevant features based on the type and characteristics of problem. In the past researchers extracted different features for classification and detection purposes. Rathore et al. [8, 23, 24] extracted geometric and hybrid features to detect and predict the colon cancer. Moreover, Hussain et al. [25] extracted acoustic and Mel frequency cepstral Coefficients (MFCC) features for emotion recognition in human speech, geometric and texture features [26, 27] for detection and recognition of human faces, complexity based features [27, 28] for heart rate variability and to distinguish alcoholic and non-alcoholic subjects.
SVM (a) linear separation and (b) margin.
The morphology of tissue is important to determine that tissues are normal or not. Morphological features are extracted from images by converting the morphology of images into set of quantitative values used in classification [29, 30, 31], segmentation [32] and so on. The shape based features are mostly largely used to classify the masses present in medical images [33]. There are seventeen shape based (morphological) features extracted for binary images are (a) Area (Ar), (b) Perimeter (PMT), (c) Maximum Radius (MAX_RD), (d) Minimum Radius (MIN_RD), (e) Eccentricity (ECTT), (f) Equivdiameter (EQDT), (g) Elongatedness (EGDN), (h) Entropy (ETP), (i) Circularity1 (CIR_1), (j) Circularity1 (CIR_2), (k) Compactness (CPN), (l) Dispersion (DPS), (m) Thinness Ratio (TN-R), (n) Standard Deviation of Image (SDV), (o) Standard Deviation of Edge (ESDV), (p) Shape Index (S-ID). The definitions and description are taken from Surendiran and Vadivel [34] and Bresson and Chan [35].
Texture features
In previous studies, the texture features are most widely used in solving classification issues [36, 37, 38], particularly to classify the colon biopsies [39, 40]. The texture features are calculated from grey level co-occurrence matrix (GLCM) that covers the spatial relationship between pixels (of image). Each entry (
Scale invariant feature transform (SIFT)
Lowe [42] proposed SIFT features that have been used to analyze the problems of panoramas reconstruction [43], face identification [44, 45, 46] and visual object tracking [47]. Due to their robustness, characteristics such as illumination changes, rotation, noise, scaling and blurry effects, SIFT features have been used in wide area of research. These characteristics of SIFT features are useful for classification of prostate cancer samples. In the initial step of extricating SIFT highlights, the key points are localized in an image. The convolved scale-space image was created by convolving the image with Gaussian. The nearby Gaussian convolved images are subtracted to create the distinction of Gaussian image. The original image is then down sampled by a factor of 2 after calculating the difference at one scale, and this process is repeated until reaching the lowest possible scale. The larger number of key points are identified at initial step which are additionally reduced in the subsequent stage. In the second step, every pixel is contrasted and 8 neighbors in its own scale and 9 neighbors in the scale below or above it. After this step, the points having their values smaller or greater compared to all the neighbouring pixels are retained. In the third step, the points which are localized poorly along edges or having poor contrast are discarded. The orientation and descriptors are assigned to the remaining key points.
(a) Error on margin using slack variable, (b, c) SVM non-linear separation.
The EFDs features are useful for discriminating the images having epileptic shape. In 1982, EFDs features are introduced by Kuhl and Giardina [48] to classify the sold objects such as car, box etc. These features are also widely have been used in applications of pattern recognition systems [49, 50]. To compute the EFDs features, we require two stages. In the initial stage, the epileptic objects are recognized in the white bunch of images. In the second stage, the epileptic items are sorted in light of their region and EFDs of the top most L objects up to the desired level X. EFDs depend on chain code by approximating the state of closed shape by a succession of eight standardized line segments and are invariant to translation dilation, rotation and starting point of a contour. To extract EFDs the H harmonic levels are used and four Fourier coefficients i.e.
where
The final Fourier feature vector is obtained by combining the above average vectors
where
The biological signals are output of multiple interacting components of biological systems which exhibit some complex rhythmical patterns. These rhythms and patterns are altered due to malfunctioning in structural components and reduced interactions in coupling functions. The change in pattern contains very useful information to understand the underlying dynamics of these systems which can be extracted in form of complexity measures to be computed using information theoretic approaches. Recently researchers used complexity based measures [51, 52, 53, 54, 55, 56] and Wavelet packet entropy [57, 58] methods to quantify and analyse the dynamics of physiological systems. In this study, the entropy features are computed by calculating the sample entropy, Wavelet entropic measures such as, Shannon, norm, threshold, sure and log energy to extract the useful information hidden in the MRIs of prostate cancer subjects.
Classification
Classification is a process of categorization in which the objects and ideas are distinguished, predictable and understood. Based on features extracted, the accuracy, and other performance evaluation parameters are estimated using this model. The known label of the test sample is compared with the classified results from the model. 10-fold cross validation was used training, test and validation purposes.
Support vector machine (SVM)
For supervised learning methods, the most robust and generalized classifiers is the SVM most widely used for the classification problems. SVM is used in many applications such as pattern recognition problems [59], medical diagnosis area [60, 61] and machine learning [62]. Currently, SVM is used in variety of applications such as text recognition, speech recognition, emotion recognition, facial expression recognition, content based image retrial, biometrics, etc. SVM construct a hyperplane or set of hyperplanes in infinite or high dimensional space which can be used for classification a good separation using this hyperplane is achieved that has the largest distance to the nearest training data point of any class (also known as functional margin). Generally, larger the margin indicate that the classifier exhibits lower generalization error. SVM kernel trick has ability to find a hyperplane that gives the largest minimum distance to the training example. In SVM theory, it is also known as margin. Optimal margin is obtained for maximized hyperplane. Another important feature of SVM is that it gives the greater generalization performance. SVM is basically, a two-category classifier which transformed data into a hyperplane depends on the nonlinear training data or higher dimension.
Consider a hyperplane
here
Combining the inequalities as:
When the data is not linearly separable, a slack variable
Subject to
Here first term i.e. regularization term on right hand side gives SVM an ability to generalize well on sparse data. While, the empirical risk can be computed using second term which denotes misclassified or lie within the margin.
Subject to
in which
SVM for non-linearly separable data
The kernel function trick is used to deal with the data which is not linearly solvable. In this case the non-linear mapping from input space is transformed into higher dimensional feature space. In this case the dot product between two vectors in the input space is expressed by dot product with some kernel functions in the feature space. The most commonly used kernel functions are polynomial, Gaussian and radial base function (RBF). Mathematically, these are expressed as:
Mathematically, the kernels can be defined as:
SVM Polynomial Kernel
SVM Gaussian (RBF) kernel
SVM Fine Gaussian (RBF) kernel
where
Subject to
The performance of SVM classifier depends on several parameters. The grid search method was used to select the optimal parameter value to set carefully the optimization parameters i.e. grid range and step size. The linear kernel involves only one parameter (‘
In Decision Tree, the similarities in the dataset are checked and accordingly the classification is made into distinct classes. DTs were used by [63] for classifying the data based on choice of an attribute which maximizes and fix the data division. The attributes are split into several branches until the termination criteria is met. Mathematically, following equations are used to construct a DT algorithm:
where
The purpose of DTs is to forecast the observations of
where
In machine learning, Naïve Bayes [64] classifier is from the family of probabilistic classifier based on the Bayes’ theorem having strong independence assumptions between the features. NB is most popular in classification tasks [65]. This algorithm is most popular since 1950. Due to the good behaviour [66], NB is extensively used in recent developments [67, 68, 69, 70, 71] which try to improve NB performance. Moreover, during the learning process of NB, it requires a larger number of parameters. It is efficiently trained in supervised learning setting. The maximum likelihood function is used for parameter estimation. NB is a conditional probability model which is computed using Bayes theorem: given a problem instance to be classified, denoted by a vector
for each possible
The Bayes theorem is mathematically expressed as:
where
where evidence
NB has complexity O(tn) to induce the classifier over a dataset having
The performance using ML classifiers to detect the prostate cancer was measured by computing sensitivity, specificity, PPV, NPV and Total accuracy.
Confusion matrix:
Sensitivity
The sensitivity measure is used to test the proportion of people who test positive for the disease among those who have the disease. Mathematically, it is expressed as:
i.e. the probability of positive test given that patient has disease.
Specificity measures the proportion of negatives that are correctly identified. Mathematically, it is expressed as:
i.e. probability of a negative test given that patient is well.
PPV is mathematically is expressed as:
where TP denote that the test makes a positive prediction and subject has a positive result under gold standard while FP is the event that test make a positive perdition and subject make a negative result.
The total accuracy is computed as:
The Jack-knife k-fold cross validation technique was applied for training/testing data formulation and parameter optimization. In this research, 2, 4, 5 and 10-fold CVs were used to evaluate the performance of classifiers for different features extracting strategies. The higher performance was obtained using 10-fold CV. As 10-fold CV is most commonly used and well-known method which is being successfully used to evaluate the performance of classifiers. Using 10-fold CV, the data is divided into 10 folds, in training, the 9 folds participate and classes of samples of remaining folds are predicted based on the training performed on 9 folds. For the trained models, the test samples in test fold are purely unseen. The entire process is repeated 10 times and each class sample is predicted accordingly. The similar approach is applied for other CVs. Finally, the unseen samples predicted labels are used to determine the classification accuracy. This process is repeated for each combination of system’s parameters, and classification performance have been reported for the samples as depicted in the Tables.
Classification performance based on single feature extracting strategy using 10-fold CV
Classification performance based on single feature extracting strategy using 10-fold CV
The ROC is plotted against the true positive rate (TPR) i.e. sensitivity and false positive rate (FPR) i.e. specificity values of prostate and brachytherapy subjects. The mean features values for brachytherapy subjects are classified as 1 and for prostate subjects are classified as 0. This vector is then passed to the ROC function, which plots each sample values against specificity and sensitivity values. To diagnose and visualize the performance of a classifier, ROC is one of the standard way to measure the performance [73]. The TPR is plotted against
Classification performance based on combination of features using 10-fold CV
Classification performance based on combination of features using 10-fold CV
Performance evaluation using test data (holdout 0.10) based on texture feature
ROC analysis based on signle feature set using a) texture b) morphological c) entropy d) EFDs e) SIFT.
ROC analysis based on combined feature set using a) texture 
ROC analysis based on combined feature set using a) entropy 
The classification performance was measured using different classifiers such as Decision Tree, SVM with linear, polynomial, RBF (Radial Base Function) and Gaussian kernels and Bayesian approach. The performance was evaluated by extracting features (texture, morphological, SIFT, EFDs and Entropy based features) as shown in Table 1. The performance was measured using sensitivity (Sens.), specificity (Spec.), PPV, NPV, TA, FPR, AUC, Error and 95% Confidence Interval (Lower bound (L) and upper bound (U)) as reflected in Tables 1 to 3.
Based on the single features extracting methodology, with 10-fold CV the highest performance obtained using texture features by employing SVM Gaussian kernel i.e. Sensitivity (98.24%), Specificity (96.34%), PPV (98.25%), NPV (98.67%), TA (98.24%), and AUC (0.999) followed by SVM polynomial with sensitivity (98.09%), specificity (97.45%), PPV (98.10%), NPV (97.17%), TA (98.09%) and AUC (0.9968). Moreover, the performance measure based on texture features for other classifiers was obtained as Bayes give TA (95.45%), followed by Decision Tree with TA (95.01%) and SVM RBF with TA (91.06%). Likewise, the highest performance evaluation based on morphological features was obtained using SVM RBF with sensitivity (91.70%), specificity (88.64%), PPV (91.74), NPV (87.90), and AUC (0.9596) followed by SVM Gaussian with TA (90.83%), SVM polynomial with TA (90.25%), DT with TA (88.94%) and Bayes with TA (86.17%). The performance evaluation by extracting different features using ML classification methods was obtained as EFDs features using DT (TA
The cancer detection accuracy was enhanced using of combination of different extracted features as depicted in the Table 2. The highest performance using combination of features i.e. texture
The Table 3 depicts the evaluation performance using test data (10% holdout) based on texture features using ML classifiers. The overall highest accuracy was obtained using SVM RBF (TA
ROC analysis based on signle feature using SVM RBF at different folds CVs for features a) texture b) morphological c) entropy d) EFDs e) SIFT.
The AUC values using single feature with different ML (Machine learning) classifiers are obtained as reflected in Fig. 4. The highest separation (AUC
Performance evaluation with different folds CVs on texture features by applying ML classifiers a) Bayes b) decision tree c) SVM Gaussian d) SVM RBF e) SVM polynomial.
Prediction model based on mean 
Mean values with 5% percentage error for selected morphological features to distinguish the prostate and brachytherapy subjects.
Figures 5 and 6 reflect AUC values of combined features. The AUC value for combination of EFDs
The comparisons using AUC values of single features with different cross fold validations (i.e. 2, 4, 5 and 10) for different classifiers are shown in Fig. 7. The highest separation (AUC
The performance was also evaluated based on different cross fold vailations (i.e. 2, 4, 5 and 10) as depicted in the Fig. 8. Based on texture features, the Basyes classifiers gives the performance using k-fold cross validation as sensitivity i.e. 2-fold (94.87%), 4-fold (94.87%), 5-fold (95.01%), and 10-fold (95.45%); specificiy with 2-fold (94.95%), 4-fold (94.07%), 5-fold (93.84%), and 10-fold (95.19%); PPV with 2-fold (95.08%), 4-fold (94.97%), 5-fold (95.07%) and 10-fold (95.58%) and son on. Similarly, based on texture features the other classifiers gives using DT the TA was obtained uisg 2-fold (95.31%), 4-fold (96.48%), 5-fold (95.01%), 10-fold (95.01%); SVM RBF TA using 2-fold (87.68%), 4-fold (90.32%), 5-fold (89.30%), 10 fold (91.06%); and SVM plynomial gives TA using 2-fold (97.95%), 4-fold (96.63%), 5-fold (98.53%), and 10-fold (98.09%). It was observed that 10 fold CV give higher performancec in most of the cases than the other folds which is most knownly used k-fold CV method to meaures the validation performance of the classifiers. Thus, for reminder of the work, 10-fold CV method was employed.
In Fig. 9, the blue color denotes the means of brachytherapy subjects and red color denote prostate cancer subjects. The lines denote the correctly classified subjects, while
The performance was evaluated using ROC (receiver operating curve) analysis. The proximity of a measured value to the true value is indicated by accuracy measure. The proportion of positive results are measured using sensitivity such as the percentage of patients who were correctly identified as having Prostate cancer while the percentage of patients who were correctly identified as having normal and negative results proportion is measured using the sensitivity. The sensitivity in the ROC curve is measured is plotted in function of (1-specificity) for different operating points. Each operating point on the ROC plot denote a specificity/sensitivity pair corresponding to a particular decision threshold. The perfect discrimination is a ROC curve passing through a coordinate of (0, 1) or upper left corner of the ROC space. Therefore, the closer the ROC curve to the upper left corner, the higher the overall accuracy if the test (i.e. maximum sensitivity and specificity). The threshold value for operating point was selected closest to the coordinate of (0, 1). Likewise, the NPV (negative predictive value) and PPV (positive predictive value) are computed denoting the proportion of Prostate cancer with negative and positive test results that were correctly diagnosed respectively.
In this research, the robust ML (Machine learning) classification techniques such as SVM kernels, Bayesian approach and Decision Tree are used to classify the cancer from the Brachytherapy subjects. High resolutions images exhibit higher nonlinear dynamics and complexity which require multidimensional features extracting strategies to detect cancer from image due to large variation in size shape to effectively distinguish the cancer. Thus, to handle this problem, different features extracting strategies are employed such as scale invariant feature transform (SIFT), texture, morphology and elliptic Fourier descriptors (EFDs). To distinguish the Brachytherapy subjects from the Prostate cancer, the novel ML classification techniques such as SVM and its kernels, Decision Tree and Bayes approaches are developed in Matlab version 2016. The Cross validation (Jack-knife 10-fold) was used to train and test the MR image database. The performance was evaluated using some measures (specificity, sensitivity, PPV, NPV, FPR and AUC). Both single and combination of features extracting strategies are devised to evaluate the performance. The higher classification accuracy based on single texture and morphological features were obtained using SVM kernels, whereas, combination of different features such as morphological with EFDs and texture gives more accuracy than single feature followed by texture feature with entropy and EFDs using SVM kernels, DTs, and Bayes approach. In the past, the researchers use only few single features based strategy and few combined features to detect the prostate cancer. However, the result reported in this study revealed that the current features extracting strategy is more effective to diagnose and detect the prostate cancer in detecting the specificity and sensitivity for acquiring higher detection ratio of prostate cancer.
