Abstract
Although there are many independent studies on the detection of white blood cell or classification of white blood cell, few papers have taken them into consideration. This study proposed a method for recognizing five types of leukocytes based on multi-scale regional growth and mean-shift clustering. The key idea of the proposed method is to extract texture features of leukocytes in a visual manner. And it is a non-parametric texture features extracting method different from traditional algorithms. Finally, SVM (Support Vector Machine) is used for classification. Some leukocyte images were used and the overall correct recognition rate reached 97.96%, indicating the feasibility and robustness of the proposed method.
Introduction
In medical field, the analysis and identification of leukocytes are of vital importance for diagnosing diseases such as acquired immune deficiency syndrome, blood cancer, and leukemia. In particular, changes in the distribution of the five types of leukocytes (basophils (B), lymphocytes (L), neutrophils (N), monocytes (M), and eosinophils (E)) are connected with the condition of the human immune system. This analysis can be conducted using both automated and manual methods. Automated methods include flow cytometry and automated counting. These instruments can quantitatively check white blood cells (WBCs), but cannot qualitatively check them and do not benefit from image processing technologies. Applying image processing technology can provide qualitative assessment to enhance judgment. In addition, some of these tasks, such as expert manual inspection of blood cells, are tedious and prone to errors.1,2 Thus, an automated system based on image processing technology can assist hematologists in accelerating the process. Hence, computer-aided identification methods3–6 have been developed instead of manual methods.
Generally speaking, automatic leukocyte recognition system is mainly composed of three key steps: leukocytes detection, feature extraction, and classification. To a certain extent, correct identification of leukocytes from their background is the first step towards success. For leukocytes extraction, researchers have recently proposed a series of algorithms for accurate segmentation and classification of WBCs. For example, this work is proposed by Ghosh et al. 7 Hemocytometer provides more accurate WBCs segmentation result than manual counting, but preparation process needs expertise. Secondly, feature extraction plays a decisive role in the entire process because a set of effective features can not only compensate for the lack of segmentation but also reduce pressure on the classifier. Leukocyte features are composed mainly of geometric features, 8 histogram features,9–11 and texture features.12–15 The geometric features of leukocytes are effective in most cases; however, errors can be easy to make for a small number of deformed cells, such as deformed lymphocytes and eosinophils, which can be difficult to distinguish. In this case, they can be identified effectively by texture features. Gray level co-occurrence matrix (GLCM) and local binary pattern (LBP) 16 can be used but the appropriate parameters of two main processing methods need to be selected according to the experience; otherwise the extraction effect would be poor. Hence, determining a robust and non-parametric texture feature extraction method is necessary. Finally, as for classification algorithms, SVM, 17 artificial neural network, and decision tree are more commonly used. Zheng et al. 18 combined expectation maximization clustering algorithm and SVM for leukocyte split and classification, and this approach relies on a selected color feature vector. Zhao et al.,19 Arslan et al., 20 Li et al., 21 and Zhang et al. 22 used a color space and a morphological algorithm to extract WBCs and then applied convolutional neural network (CNN) for WBC classification. CNN is applied to medical images, and the first problem is the limited training data sample.
Although the above methods of segmentation and recognition are different, the idea of the algorithms is to improve the accuracy of segmentation and classification. Based on the image color space, distance transformation, and GVF (gradient vector flow) Snake, leukocytes are extracted from cell images. Then, the author adopts a non-parametric texture algorithm based on mean-shift to extract texture feature of WBCs. Finally, SVM is used for classification.
Extraction of leukocytes
The proposed segmentation scheme is as shown in Figure 1. Based on color space, distance transformation, and GVF Snake, leukocytes were extracted from cell images. HSI (Hue-Saturation-Intensity) color model is selected by observing different components of different color models. As shown in Figure 1, H component of HSI color model can distinguish the location of WBCs, and the effect of clustering is greatly improved.

Flowchart of the proposed segmentation scheme.
Feature extraction
After the leukocyte is extracted, it is necessary to extract its feature. GLCM and LBP can be used but the appropriate parameters of two main processing methods need to be selected according to the experience, otherwise the extraction effect would be poor. Hence, determining a robust and non-parametric texture feature extraction method is necessary. The author adopts a non-parametric texture algorithm based on mean-shift to extract the texture feature of WBCs.
Mean-shift algorithm
Mean-shift clustering is a non-parametric algorithm that can locate the point of the maximum probability density function, which may represent a certain pattern feature. This algorithm has been applied successfully in image smoothing, image segmentation, and moving object tracking. In the d dimensional Rd space, data sampling points xi (i = 1… n) are given, and the basic mean-shift vectors are defined as follows
There are
If a circular symmetric kernel is used, the profile function
Algorithm steps
Selecting data space: The color image has multiple data spaces to choose from, such as gray scale, RGB (Red, Green, Blue) color space, or HSV (Hue, Saturation, Value) color space. Selecting gray scale can be advantageous because the gray scale image can effectively reduce the adverse effects of light and WBC staining effectively. Finding the feature points: Using mean-shift algorithm for image processing, locate the coordinates of the probability density extreme point, and obtain the gray value; however, limited data may result in some missing texture features, which require the next extension. Extending feature areas: The texture of cell belongs to natural texture, and the change of gray level is large. Although feature points can be found, extending the feature regions accurately can still be difficult. However, the feature region can be extended using the regional growth method, but the extension is not accurate. Subsequent tests show that it can sufficiently meet the demand. The above-mentioned feature points are regional growth points, with gray scale value lower than three gray levels as the growth termination condition, and finally obtain a series of characteristic regions.
After the three steps, the processing results are shown in Figures 2 to 6. Figure 2(a) shows that the neutrophils are characterized by a fine distribution of fine uniform particles in the light cytoplasm, which are also shown in Figure 2(b). Conventional edge detection and threshold segmentation may have difficulty identifying the characteristic points in the graph. After processing, Figure 2(c) shows that all the feature points are identified and the high gray value is highlighted, thereby preparing for the feature vector extraction in the next step. In the same way, after eosinophilic cells are processed by the proposed algorithm, the coarse particles that are not easy to separate from the homogeneous adhesion, can still be well labeled as shown in Figure 3(c). Figure 4 shows that the texture of the alkaline cell is not even coarse grain. The characteristic region gathers the size of the different blocks through processing. The area and uniformity of these blocks can be measured and recognized. The lymphocytes are shown in Figure 5. Compared to Figure 4(c), the regional block is relatively uniform and has a relatively small number, which could most likely ease the confusion between lymphatic and eosinophilic cells. Finally, in processing the single nucleus cell, as shown in Figure 6(c), its characteristic region and gray scale are small, mainly in the nucleus, cytoplasm has almost no feature points, which makes it easy to identify.

Processing of neutrophil. (a) Eosinophil. (b) Mesh of gray image of (a). (c) Mesh of processed image.

Processing of eosinophil. (a) Neutrophil. (b) Mesh of gray image of (a). (c) Mesh of processed image.

Processing of basophile. (a) Basophile. (b) Mesh of gray image of (a). (c) Mesh of processed image.

Processing of lymph. (a) Lymph. (b) Mesh of gray image of (a). (c) Mesh of processed image.

Processing of monocyte. (a) Monocyte. (b) Mesh of gray image of (a). (c) Mesh of processed image.
After processed by mean-shift clustering and regional growth algorithm, characteristics of five types of WBCs are highlighted. Feature regions are contracted into independent blocks, which can be distinguished by gray, area, and distribution density.
Leucocyte classification
When applying SVM for classification, a considerable number of representative WBCs are usually taken from the data to be processed as a test set. In addition, a certain number of WBCs were taken as samples for testing, and the feature was extracted from the training set, then trained with SVM classifier to get the template of classification. Finally, the classification image was classified by the classification template. The process of classification is shown in Figure 7.

Flow chart of training and classmcation by SVM.
Leucocyte feature extraction
1. Counting the eigenvalue and constituting feature vectors
The feature values are as follows: average gray value of feature points (
In formula (7), di represents the distance between two adjacent feature points.
2. Analyzing of feature vectors
Data in Table 1 are analyzed as follows. The mean gray values
Feature vector of WBCs.
WBCs: white blood cells.
Robustness testing
A common recognition problem is the weakening of texture. For instance, experts identified Figure 8(a) and (c) as basophils. Compared with the standard Figure 8(c), the texture is weaker and more smooth in Figure 8(a), and thus, using the traditional GLCM and LBP, Figure 8(a) will be more easily identified as lymphocytes. The processed data in Table 2 can be identified correctly through this method. Although the

Processing of different repeatability in the same type of basophil. (a) Weak texture. (b) Mesh of processed (a). (c) Standard texture and (d) Mesh of processed (c).
Comparison of eosinophil and lymphocyte feature vectors.
Note: Images 1 and 2 in Table 2 are shown in Figure 8(a) and (c), image 3 is Figure 4(a), are all basophils, whereas image 4 is Figure 5(a), which belongs to a lymphocyte.
Another situation is the small number of lymphocytes without cytoplasm, which causes them to be easily mistaken as basophils. Figure 9 shows that the n value is 57, but basophils have an n value greater than 110, and thus, they can be easily identified by the present algorithm.

Special lymphocyte without cytoplasm.
Another error-prone condition is the variant lymphocyte as shown in Figure 10. Because the roundness of approximately 1 is a morphological feature of lymphocytes, deformed lymphatics are often misidentified as non-WBCs. Using the proposed algorithm, the number n of the feature points is 38, which meets the characteristics of lymphocytes and removes the problem of being error prone.

Deformed lymphocyte.
In short, the proposed algorithm can correctly identify the conventional texture algorithm error-prone situation, so the algorithm has a high robustness.
Result and discussion
Dataset
In this paper, two sets of data were used in the experiment. The first group of data is obtained from First Affiliated Hospital of Fujian Medical University. The experimental smears were prepared through conventional methods in the hospital and stained with Wright staining. All WBCs collected from the peripheral blood and bone marrow cell images were identified and classified by laboratory specialists. In the cell image acquisition, an OLYMPUS BX51 microscope and a Nikon high-performance color digital camera were used. The blood smears were observed under microscope 100 times. The field of vision was located in the area where the WBCs were concentrated, and then transferred to the camera mode. The microscope was used to fine-tune the WBCs to the appropriate position of the image. Finally, the digital camera was used to take cell images. The second set of data is downloaded from the ALL-IDB1 (Acute Lymphoblastic Leukemia Image Database1) and ALL-IDB2 datasets. 23
To verify the practicability of this algorithm, different types of peripheral blood and bone marrow cell images were tested and representative images selected for further analysis. The samples were pre-processed and the proposed algorithm was used to identify them. A comparison of the experimental results was also conducted.
Experimental results
Table 3 shows that after treatment with conventional morphological methods, neutrophils have significant multi-core features, resulting high recognition rate (as high as 94.6%), but the lymphocytes and basophils have similar morphologies and thus, the recognition rate was less than 83%. Morphological characteristics of monocytes were similar to the other four types of WBCs, causing the accuracy rate to drop to 90.8%.
Recognition rate with geometric features.
Table 4 shows the combination of morphological features and the proposed algorithm for texture feature extraction. The recognition of the correct rate was significantly improved. Basophilic recognition rate was 100%, because of its dark grain texture was easily extracted by the proposed algorithm and the number of samples was relatively small. Lymphatic recognition rate was also increased from 83.0 to 93.1%, because the number n of feature points serves as a good marking. The recognition rate of monocytes and eosinophils was not significantly improved, because juvenile monocytes were misidentified as lymphocytes, whereas eosinophil primary texture was very similar to that of neutrophils, resulting in higher difficulty of artificial recognition.
Recognition rate with this work.
Table 5 shows that the correct recognition rates for basophils are similar for several algorithms, and the algorithm is optimal for the other four leukocytes. When the number of samples is increased to 500, the recognition rate decreased because of the absence of a special case, which made it difficult to identify because of small size of the sample. However, the recognition rate is still high, reflecting better robustness. The experimental data of Table 5 are based on database ALL-IDB.
Objective evaluation of the classification results.
Conclusion
This paper makes full use of the feature information of cell image, including its color information, gray scale information, shape and size, and distance transformation. It also uses gradient vector flow active contour to achieve leukocyte extraction and has a good segmentation effect. And a method for extracting natural texture features based on mean-shift clustering was proposed, which is successfully used to identify and classify human peripheral blood and bone marrow leukocytes automatically. The experiment shows that the proposed algorithm has good robustness and practicability, and better recognition rate. Certainly, our proposed algorithm is not perfect. Some limitations also exist. It cannot detect all the WBCs for some complex cell images and it sometimes regards a few non-WBCs as WBCs. Adhesion segmentation for the diagnosis of complicated abnormal cells in bone marrow diseases remains a great challenge. Hence, how to find a more effective detection method based on our method is the direction of our study in the future. In addition, WBCs classification recognition can be realized by using the current popular convolution neural network, and its time cost is greatly reduced.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial supportfor the research, authorship, and/or publication of thisarticle: The authors would like to acknowledge the financial support, provided by the National Science Foundation (Grant No. 60873186) and the Educational Commission of Fujian Province (Grant No. JAT160075), China, for the research, authorship and publication of this paper.
