Abstract
BACKGROUND:
Classification of fresh and processing strawberry cultivars is important to make the best utilization of different cultivars in processing. The aim of the study was to investigate whether support vector machine (SVM) and extreme learning machine (ELM) could assist the classification of 15 strawberry cultivars. Twenty-two characteristic indexes were analyzed, including not only appearance indexes but also nutritional indexes.
RESULTS:
The results showed that classification accuracies of 100% and 88.52% were obtained by using SVM and ELM with 3-fold cross validation, respectively. Moreover, seven characteristic variables extracted from 22 quality indexes by SVM could make it possible to determine the adaptability of a particular cultivar by measuring relatively small number of indexes.
CONCLUSION:
Both ELM and SVM models are feasible to identify fresh and processing cultivars. However, SVM showed better performance for its accuracy and simplicity, indicating that SVM would be a good choice for classification of strawberry cultivars.
Introduction
Strawberries (
Strawberry juice is one kind of the most popular strawberry products. The color, aroma, texture and nutrition of strawberry juice are dependent on the strawberry cultivar used in processing. Generally, juice yield, pH, ascorbic acid, total phenolics are indexes to evaluate the adaptability for juice processing [4, 5]. Besides, the activities of endogenous enzymes such as polyphenol oxidase (PPO), peroxidase (POD) and pectin methylesterase (PME) [6] are also considered, since these enzymes have an important impact on the sensory quality of strawberry juices [7]. There were undeniably some limitations using several indexes to reflect the impacts of cultivars on the quality of strawberry juices. However, detection of all kinds of indexes of a cultivar for classification is a time consuming and labor intensive process. The present classification analysis based on appearance is only applicable to the classification of fresh edible varieties, but no description is made for the distinction between fresh and processed varieties. There is still lack of the specific evaluation of quality characteristics for the classification of fresh and processing strawberry cultivars in industry.
SVM and ELM neural networks approaches have been extensively applied to establish cultivar identification and have obtained good classification results combined with modern instrumental analysis methods [8–10]. SVM classification algorithm is a promising method which has many attractive advantages and excellent performances. It does not need any assumptions about the functional form of the transformation because the kernel implicitly contains a non-linear transformation [11]. It is capable of making both classification and regression. In addition, it does not need a large number of training samples for developing model and it is not affected by the presennce of outliers [12]. SVM, as an outstanding supervised algorithm, aims to find an optimal hyperplane to correctly separate the objects of the different classes as much as possible. SVM could effectively avoid the over-fitting problem because it is based on the structural risk minimum mistake rather than the minimum mistake of the misclassification on training set. Therefore, it has good generalization performance and often performs well on different datasets [13]. ELM was originally developed from feedforward neural networks, and was then developed to the single-hidden layer feedforward neural networks (SLFNs) which randomly chooses the input weights and analytically determines the output weights of SLFNs [14]. Because of its unique network output structure, the ELM algorithm could learn fast with high generalization performance and implement the multi-class classification quickly [10]. It was presented that the accuracy of ELM was better than its competitors in most cases. Moreover, on the classification stage, ELM performed much faster than K-nearest neighbor (KNN), SVM, and back propagation artificial neural networks (BP-ANN). Besides, other methods are not selected for several reasons. Generally, the parameters of the BP-ANN are learned via gradient descent algorithms, which are relatively slow and have many convergence issues such as stopping criteria, learning rate, learning epochs, and local minima; KNN has slow running speed and its classification accuracy depends closely on the dataset and partial least-squares discriminant analysis (PLS-DA) has sometimes difficulties in yielding satisfactory performance because of nonlinearity and over-fitting [13].
Classification of strawberry cultivars has been studied by several scientists in recent years, based on the combination of mathematical model and evaluation of the appearance of strawberries [15–17]. For example, Yamamoto et al. used an image analysis system combined with cluster analysis, multidimensional scaling and discriminant analysis of the appearance characteristics to classify strawberry cultivars [15].
In addition, strawberries have multiple features. Therefore, they need to be analyzed simultaneously for correct evaluation of not only appearance but also nutritional components. This study has been designed to compare and classify the fifteen cultivars of strawberries by measuring the following indexes: color indexes (including
Materials and methods
Chemicals
Methanol, acetonitrile and formic acid of high-performance liquid chromatography (HPLC) grade were purchased from Honeywell Burdick & Jackson (SK Chemicals, Seoul, Korea). Folin-ciocalteu’s phenol reagent, ascorbic acid standard, 2,2-diphenyl-1-picrylhydrazyl (DPPH), sugar standard (sucrose, glucose, fructose), 6-hydroxy-2,5,7,8-tetramethylchroman-2-carboxylic-acid (Trolox) and 2,4,6-tris(2-pyridyl)-s-triazine (TPTZ) were purchased from Sigma-Aldrich Co. (Shanghai, China). Ethanol, hydrochloric acid (HCl), sulfuric acid, phosphate buffer, sodium hydroxide (NaOH), methanal, galacturonic acid, sodium acetate, gallic acid, sodium tetraborate, guaiacol, sodium carbonate, catechol and other chemicals of analytical grade were purchased from Beijing Chemicals Co. (Beijing, China).
Plant materials
Fruits of fifteen strawberry (
Sample preparation
A total of fifteen strawberry cultivars are in 75% ripeness. After harvest or purchase, fresh strawberry fruits were used for hardness and juice yield analysis. Strawberry fruits that were not used in the hardness and juice yield analysis were immediately frozen in liquid nitrogen after the peduncle and calyx were removed, and stored at –80°C for analysis of pH, TSS, TA, the ratio of TSS/TA, pectin content, color indexes, TP, ACY, AA, sugar contents, antioxidant capacity and activities of endogenous enzymes (PPO, POD and PME). At the time of analysis, the frozen strawberry fruits were thawed at 4°C for 12 h, and then were crushed with a beater (MJ-25BM05A, Midea Co., Foshan, Guangdong).
Physico-chemical indexes
Physico-chemical indexes of strawberry were determined according to the methods proposed by Cao [18]. The values of the pH, TSS, TA, pectin content, hardness, juice yield and the ratio of TSS/TA were determined.
Color indexes
Color of strawberry fruits was expressed in
Nutritional indexes
The amount of ACY was determined by using the pH-differential method previously described with some modificaitons and expressed as grams cyanidin 3-glucoside (Cy-3-glu) per kilogram fresh weight (g Cy-3-glu/kg FW) with molecular weight of 449.2 g/mol and a molar absorptivity of 26,900 [20]. The TP content of the fruit samples was measured according to the Folin-Ciocalteu method and expressed as grams gallic acid equivalents (GAE) per kilogram fresh weight (g GAE/kg FW) [21]. A HPLC method was used to determine AA content of strawberry fruits and a modified HPLC method was used for the quantification of sugars (sucrose, glucose, fructose and total sugars) [18].
Antioxidant capacity
The antioxidant capacity was measured using the DPPH assay previously described and FRAP assay according to the method proposed by Benzie and Strain with some modifications [22]. The results of DPPH and FRAP assays were both expressed as millimole Trolox equivalent (TE) per kilogram fresh weight (mM TE/kg FW) [19].
Activities of endogenous enzymes
PPO and POD activities was determined spectrophotometrically as the change in absorbance at 420 nm and 470 nm, respectively, which according to the procedure described by Cao with some modifications. PME assay was performed according to potentiometric titration method with some modifications [18].
Classification model
Support vector machine (SVM) classification model
As an effective classification method, SVM was proposed on the basis of statistical learning theory by Cortes and Vapnik [24]. SVM learning algorithm applies one hidden layer of non-linear neurons, one-output linear neuron and specialized learning procedure leading to the global minimum of the error function and excellent generalization ability of the trained network [24].
In the standard two-class classification problems, a set of training data T = { (
When solving this problem, we can get the classification decision function
A new learning scheme of feedforward neural networks, ELM was first proposed by Huang et al. [13] Compared with the traditional computational intelligence techniques, ELM provides better generalization performance at an extremely fast learning speed with better nonlinear processing capacity [25]. Classification and regression problems are the main objects of ELM learning algorithm.
Given a training set ℵ ={ (
The Equation (2) can be written compactly as:
It has been proved that one may randomly choose and fix the hidden node indexes and the output weights
For an unknown sample
Where
All calculations were performed in Matlab 2007a under Windows XP with 3.2GHz CPU and 4GB memory, and the SVM algorithm was implemented with the LIBSVM (Version 2.9) toolbox.
Statistical analysis
All of the extractions and measurements were performed in triplicate except hardness assay (10 replicates). The experimental data were reported as the means±the standard deviation (SD). Analysis of variance (ANOVA) of the data was evaluated by using SPSS software (version 17.0). Statistic differences with
Results and discussion
Characteristics analysis of different strawberry cultivars
Sugars, TSS, pH, TA, the ratio of TSS/TA analysis
The sugars, TSS, pH, TA, the ratio of TSS/TA indexes of fifteen cultivars are shown in the Table 1. Sugars are the main soluble components in strawberry fruit, with sucrose, glucose and fructose, and accounting for more than 99% of the total sugar content. The total sugar contents were higher in the cvs. ‘Akihime’, ‘Benihoppe’ and ‘Sachinoka’ (from 52.02 to 55.39 g/kg FW) than the fruits of cv. ‘Cream XI’, which contained very small amounts of total sugar (31.73 g/kg FW). Cv. ‘Benihoppe’ had the highest contents of glucose and fructose, thus resulting in a higher amount of total sugar (54.51 g/kg FW). The sugar contents (sucrose, glucose, fructose and total sugars) of 15 strawberry cultivars in this study were in the similar range as in the 13 strawberry cultivars that grown in Slovenia [32]. The TSS values were in the range of 5.83% –10.67%. In the present study, a great variability in sugar and TSS indexes existed among the 15 strawberry cultivars which are in agreement with the previous studies [28–32].
The pH, titratable acid (TA), total soluble solids (TSS), the ratio of TSS/TA, juice yield, hardness, pectin content of fifteen strawberry cultivars
aTitratable acidity is expressed as citric acid. bData analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at P < 0.05 level of LSD test.
The pH, titratable acid (TA), total soluble solids (TSS), the ratio of TSS/TA, juice yield, hardness, pectin content of fifteen strawberry cultivars
aTitratable acidity is expressed as citric acid. bData analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at
The ratio of TSS/TA strongly varied among the 15 cultivars, and a 2-fold difference was found between cultivars with the lowest value (‘Fugilia’ and ‘San Andreas’, 8.23 and 8.41) and the highest value (‘Akihime’, 20.01). It has been reported that the ratio of TSS/TA affect the overall flavor of strawberry fruits more than the TSS or TA value alone [4], which has been identified as a major factor determining the quality of strawberry products.
The content of pectin in the strawberry fruits varied from 0.71 g/kg FW in the cv. ‘Benihoppe’ to 1.75 g/kg FW in the cv. ‘Albion’. Pectin substance is correlated with fruit texture, the degradation of pectin substances results in a reduction of the ability of a juice to hold its solid portion in suspension throughout storage [6].
Fruits of the cv. ‘Sweet Charlie’ were the softest ones, with an average hardness value of 65.04 g/cm2. The highest hardness value 382.88 g/cm2 was observed in ‘San Andreas’, which is close to 6.0 times as the lowest value. The strawberry fruits with lower value of hardness are extremely prone to mechanical damage during transport and storage, which limits the post-harvest shelf life of the cultivar.
Juice yield is the most important indicator for juice producing. ‘Saga’ exhibited the highest juice yield while ‘Albion’ exhibited the lowest, which were 72.02% and 44.45%, respectively.
Color analysis
The results of color indexes (
Color parameters (L*, a * and b *) of fifteen strawberry cultivars
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at P < 0.05 level of LSD test.
Color parameters
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at
Anthocyanins (ACY) are the most abundant polyphenols in strawberry. In this study, the fruits of the cvs. ‘Monterey’, ‘Portola’ and ‘Fugilia’ developed high contents of ACY (from 0.22 to 0.23 g Cy-3-glu/kg FW), while the fruits of the cv. ‘Japan II’ attained very small amounts of ACY (0.05 g Cy-3-glu/kg FW).
The results for TP, ACY, AA contents as well as sugar contents (sucrose, glucose, fructose and total sugars) of different strawberry cultivars are shown in Table 3. The results pointed to the fact that there are great differences in the TP contents among the fruits from different strawberry cultivars. A high intake of bioactive compounds, especially phenolic compounds, may in fact lower the risk for some diseases, such as cancer, cardiovascular and other chronic diseases [2].
Total phenolics (TP), total anthocyanin (ACY), ascorbic acid (AA) and sugar contents (sucrose, glucose, fructose and total sugars) of fifteen strawberry cultivars
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different indicate significant differences at P < 0.05 level of LSD test.
Total phenolics (TP), total anthocyanin (ACY), ascorbic acid (AA) and sugar contents (sucrose, glucose, fructose and total sugars) of fifteen strawberry cultivars
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different indicate significant differences at
Obviously, there were large variations for antioxidant capacity among 15 strawberry cultivars (Table 4). The fruits of the cv. ‘Portola’ had the highest antioxidant capacity (DPPH, FRAP) values (74.40 and 22.62 mM TE/kg FW, respectively), whereas the lowest antioxidant capacity (DPPH, FRAP) values were observed from the cv. ‘Saga’ (32.17 and 11.10 mM TE/kg FW, respectively). However, the results of the DPPH and FRAP assays for antioxidant capacity were closely correlated in 15 cultivars, suggesting that the two assays are almost comparable and interchangeable in the case of strawberry [34].
Antioxidant capacity of fifteen strawberry cultivars
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at
The activities of PPO, POD had significant differences among 15 strawberry cultivars (Table 5). The fruits of cv. ‘Sweet Charlie’ exhibited PPO and POD activities of 0.3481 U/g FW and 1.3129 U/g FW, respectively, which were significantly higher than in any other cultivars. PPO and POD widely exist in all kinds of plants which involved in enzymatic browning, thus not only affect the appearance and flavor, but also reduce the nutrients of fruits and vegetables. The degradation of anthocyanins might be caused by the residual enzyme activities of PPO and POD in strawberry juice, as reported previously [35]. In addition, the activities of PPO and POD cause the degradation of ascorbic acid and polyphenols compounds which could lead to browning discoloration and loss of antioxidant activity of cold stored strawberry fruit [36].
Activities of endogenous enzymes (PPO, POD and PME) of fifteen strawberry cultivars
Data analyses were carried out by using SPSS Version 17.0. Data were represented as mean value±standard deviation (SD) of at least a triplicate analysis. Values in the same column followed by different letters indicate significant differences at
PME is the main food quality enzyme, which has been found in plants such as strawberry, apple, orange, soybean and tobacco, as well as in pathogenic fungi and bacteria. It catalyzes the hydrolysis of the methyl ester groups from pectin and leads to the formation of a calcium pectate gel [36]. The PME activity of different strawberry cultivars were within the range of 0.0028 U/g FW (cv. ‘Cream XI’) to 0.0088 U/g FW (cv. ‘Sachinoka’) in the current study. The activity of PME has an obviously effect on the observable quality of fresh and processed products, for example, reducing the stability of vegetables and fruits juice [37, 38] Harmful effects of PME activity on cloud stability of juices have been reported in detail [39] Thus, PME control is very important in the maintaining stability of strawberry products.
Great variability existed among the examined cultivars regarding their quality characteristics, and there were also differences compared with previous results. It can be seen from the results that there existed slightly differences on the highest ACY content between our result (0.23 g Cy-3-glu/kg FW) and previously reported papers (0.66 g Cy-3-glu/kg FW) [40]. Moreover, the mean value of sucrose in this study was somewhat higher than that of 13 strawberry cultivars in previous research [33] while the average amounts of glucose and fructose were lower than the corresponding contents. And the results of the antioxidant capacity obtained in this study were somewhat lower than those reported in previous study [41] The variations in physico-chemical and nutritional indexes, antioxidant capacity and activities of endogenous enzymes between different studies can be explained by the differences of genotypes, cultivars, growing conditions, degree of ripeness and post-harvest handling techniques [31]
SVM and ELM were used in the classification of 15 strawberry cultivars based on 22 quality indexes, including physico-chemical indexes (pH, TSS, TA, the ratio of TSS/TA, juice yield, hardness, pectin content), color indexes (
SVM network for classification
In the present study, 1-norm SVM algorithm was applied to build the strawberry cultivar classification model. The first important point is that the choice of kernel function when establish the classification model using SVM algorithm. By choosing an appropriate kernel, we can put more pressure on the similarity between samples. Different kernel functions have been proposed and widely applied in the past researches. Linear, polynomial of a given degree, radial basis function (RBF) and multi-layer perceptron (MLP) are the most popular kernel functions which are generally used for both discrete and continuous data [11].
Compared with other available kernel functions, linear kernel function was chosen in this model. Due to the fact that the precision of the model is greatly influenced by kernel indexes, the parameter should be optimized after selecting the appropriate kernel function [42]. In this model, the parameter
Results showed that SVM had good performances on classification, which obtained cultivars classification accuracy of 100% using SVM algorithm with 3-fold cross validation. On the other hand, in the SVM classification model, the ratio of TSS/TA,
Results of SVM
Results of SVM
ELM method was applied in the study to get the best performance of classification. Here the optimal model indexes should be found. The parameter selection of ELM is relatively simple. The most important step is to determine the numbers of hidden layer nodes of ELM model, which can be obtained by trial and error method. Different numbers of hidden layer nodes affect the precision of ELM significantly. The activation function used in our ELM models is the sigmoidal function
Comparative classification performance of SVM and ELM for strawberry cultivars
The classification accuracies for strawberry cultivars of the 1-norm SVM and ELM models reached 100% and 88.52%, respectively (Table 7). According to the ELM classification, ‘Sachinoka’ and ‘Allstar’ cultivars were not classified correctly. Among them, LIBSVM toolbox was used in the implementation of SVM and cross-validation method was used for indexes selecting.
A two-class problem confusion matrix of ELM
A two-class problem confusion matrix of ELM
The training speed of ELM is much faster than that of SVM, which is similar to the results of Liu [43]. Due to the fact that cross-validation method is used to select indexes in SVM, it will take long time to select the indexes if the training sample was too large. On the other hand, ELM can achieve ideal classification performance only when the number of hidden nodes is large enough. And because of the unique training method, a global optimal solution could be obtained in one time.
The paper has presented the strawberry cultivars classification method by using SVM and ELM algorithm based on quality indexes of strawberry fruits. Fifteen cultivars of strawberries were characterized and compared by measuring their quality indexes. A satisfactory conclusion was reached by using the data of quality indexes obtained in the study to establish strawberry classification models (SVM and ELM). In other words, the classification model obtained by the study can be used to test whether a given unknown strawberry cultivar is suitable for fresh consumption or for juice processing. The SVM and ELM models reached classification accuracies of 100% and 88.52%, respectively. Moreover, seven characteristic variables (the ratio of TSS/TA,
Footnotes
Acknowledgments
This work was supported by the National Key R&D Program of China No. 2017YFD0400700, Special Fund for Agro-scientific Research in the Public Interest No. 201303073 and Project No. 2012BAD31B05 of the Key Technologies R&D Program of China. Thanks for Liming Yang’s support on the model operation.
The authors declare they have no actual or potential competing financial interests.
