Abstract
For fruit picking robot, it is an essential prerequisite for achieving fruit picking using machine vision technology to accurately identify the fruits growing in the natural environment. This article presents a vision system of fruit picking robot to perform fruit location and three-dimensional model reconstruction. Firstly, combining the features of color and shape of fruit to reconstruct the actual contour of overlapped and sheltered fruits. Secondly, the least square method was used to reconstruct the three-dimensional model of each fruit according to the spatial coordinates corresponding to image location. Finally, fruit picking experiments in the laboratory environment are used to verify the feasibility of the proposed vision system. Three parameters including Segmentation Error, Intersection Over Union, and False Negative Rate are used to evaluate the performance of the algorithm. The average Segmentation Error, Intersection Over Union, and False Negative Rate of the fruit location algorithm based on geometry were 6.36%, 87.9%, and 6.72%, respectively. The experimental results showed that the average computation time of the algorithm is 3.2 s and the reconstructed three-dimensional model matched the size and position of fruits in the actual scene. The research results can be applied to the vision system of fruit picking robot.
Keywords
Introduction
With the development of science and technology, agricultural equipment develops toward the direction of automatic and intelligent. 1 Fruit picking robot, as one important kind of agricultural equipment, is of great significance to solve the problems of labor shortage, low productivity, and high-production cost in the agricultural production. 2 In the process of fruit picking, the robot captures the image in real time through the camera, and the field of view includes the fruits, branches, leaves, and so on. What is more, the target of fruit picking robot is easy to be affected by uneven illumination, overlapped and sheltered, which leads to the problem of false detection. 3,4 Therefore, for the fruit picking robot, whether it can accurately identify the fruit target and quickly locate the picking point and then complete the fruit picking on the basis of ensuring the undamaged fruit are important evaluation standards. 5,6
Image segmentation is an important processing task for agricultural robots because the subsequent identification processes by the robots are based on the results of image segmentation, such as spatial location 7,8 and three-dimensional (3D) model reconstruction. 9,10 In the process of fruit picking using machine vision technology, we need to obtain the region of fruit in the image captured by camera and then acquire the specific coordinate of the fruit in the world coordinate system. The acquisition of visual information can apply different equipments, such as RGB or RGB-D camera, 11,12 3D laser scanner, 13,14 and thermal camera. 15 This article focuses on the algorithm of fruit recognition and location based on RGB camera. In the research of target location algorithm, there are mainly two different methods: traditional image processing algorithms and machine learning algorithms. Zhuang et al. adopted traditional algorithms to extract the local texture information and then made the final decision based on a histogram intersection kernel-based support vector machine according to the local texture information. 16 Tao and Zhou extracted the improved 3D descriptor with the fusion of 3D geometry features and color features according to the preprocessed point cloud data, and then the optimized support vector classifier was used to identify the branches, leaves, and fruits. 17 Williams et al. proposed a new fruit scheduling system, in the part of fruit detection and location, a fully convolutional network was utilized to perform object segmentation, and then the position of each fruit was acquired through stereo matching. 18 Majeed et al. extracted the RGB and depth information from the acquired point cloud data to remove the background trees and then used a convolutional neural network (SegNet) to identify the trunk and branches of the tree. 19 The current state-of-the-art deep learning approaches require a trade-off between detection rate and processing time. 18 Using traditional machine vision method to realize object recognition is mainly based on the features of color, texture, and shape. It is the combinations of processes, such as color segmentation, thresholding, masking, and edge detection. For instance, Wang et al. transformed the RGB color space to Lab color space and then adopted K-means clustering algorithm to recognize the occluded apples. 20,21 Xiong et al. combined the improved fuzzy clustering method (FCM) and random signal histogram to remove the background of the nocturnal image in YIQ color model and then used the Otsu algorithm to identify the fruit from the stem base. 22,23 Chaivivatrakul and Dailey proposed a study of texture-based fruit detection for green fruits (bitter melon and pineapple) on plants in the field and recognized the green fruits in natural environment based on feature classification and region extration. 24 Rizon et al. combined the morphological operator and texture analysis to isolate the overlapped and sheltered mango fruit and then used randomized Hough transform (RHT) to determine the fruit region and the picking point. 25 Luo et al. extracted the region of the overlapping grape clusters based on K-means clustering algorithm and separated the region pixels of double overlapping grape clusters based on the contour intersection points and then detected the cutting point of each grape cluster according to the geometric constraint. 26 Fu et al. distinguished the fruits calyx from the skin based on color differences and obtained the contact points between the adjacent fruits by analyzing the edge information and then determined the borders of each fruit according to the contact points. 27 Song et al. adopted the convex hull theory in the segmentation of overlapped fruits and obtained the effective edge and intersection point of overlapped apples and then reconstructed the actual contour of each fruit based on effective information. 28,29 Lu and Sang detected the contour fragments of fruit target and the corners within the edges and then combined the valid contour fragments by analyzing the concavity or convexity, bending degree, and length to reconstruct the actual contour of occluded fruits. 30 Kelman and Linker presented a method for detecting the fruit in the tree using shape analysis and then obtained the edges that conformed to the geometric features and located each fruit according to these merged eligible edges. 31 Miao et al. proposed a combined algorithm based on Otsu algorithm and watershed algorithm to recognize and segment the overlapped objectives under natural environments. 32
In this article, the grasping of spherical fruit in the natural environment is taken as the research object, and the recognition, location, and model reconstruction algorithm of the fruit picking robot are studied. Through the research of image processing algorithm, a new method is proposed to reconstruct the actual contour by extracting the effective edge of each fruit. According to the results of image location, the 3D coordinates are obtained based on the binocular camera, and then the 3D model of each fruit is reconstructed according to the spatial coordinates.
Description of the location and 3D model reconstruction algorithm
In this article, it distinguishes the fruit target and background according to color feature and then reconstructs the actual contour of overlapped and sheltered fruits according to shape feature. The location and model reconstruction algorithm for overlapped and sheltered fruits consists of five steps: Segmenting the fruits from the complex background after image denoising; Obtaining the simply connected domain of single object from overlapped and sheltered fruits; Acquiring the pixel coordinates set of outer contour of each fruit using eight-connected boundary tracking algorithm and extracting the effective edge from the non-actual contour of overlapped and sheltered fruits based on geometry; Reconstructing the actual contour of each fruit using least square method according to the effective information; Obtaining the spatial coordinates corresponding to image location based on binocular stereo vision and then reconstructing the 3D model of each fruit.
Recognition of overlapped and sheltered fruits
The original image acquired by camera in natural environment includes the fruits, branches, leaves, and so on. In the process of fruit recognition and location, it is necessary, firstly, to segment the fruits from complex background. Clustering algorithms, such as fuzzy c-means and K-means, have been widely used because of its good effect in the field of background segmentation. However, the accuracy of clustering algorithm is highly dependent on the clustering parameters and the improper selection of clustering parameters may result in the failure of segmentation.
In the natural environment, the color of ripened fruit is mostly close to red or orange and the background color is mostly close to green, blue, and other cold colors. 33 The difference of color between fruit and background is obvious, and therefore the image color segmentation is also one of the effective methods for background segmentation. In this article, the normalized color difference is used to segment the fruits from the complex background. The algorithm based on normalized color difference is expressed as
where R, G, B are color components in RGB color space.
Perform image segmentation according to equation (2)
where
As shown in Figure 1, three algorithms can segment fruits from the complex background successfully. However, it is difficult to determine the optimal parameters to obtain the best segmentation effect in the real scenario. Therefore, the algorithm based on normalized color difference is adopted to recognize the overlapped and sheltered fruits growing in the natural environment in this article.

Recognition results of different methods: (a) original image; (b) K-means clustering algorithm; (c) FCM clustering algorithm; and (d) normalized color difference. FCM: fuzzy clustering method.
Segmentation of overlapped and sheltered fruits
In the natural environment, the problem of fruits overlapping exists widely, which severely affects the recognition performance of the fruit picking robot. Therefore, it is essential to identify the accurate position of each fruit from overlapped fruits and then pick them in turn.
Distance transform is a global operation on binary image which will generate a gray image, the value of pixel represents the distance between the nonzero pixel and the nearest zero pixel in an image. After normalizing the gray image, the brightest point in the image indicates the nonzero pixel farthest from the zero pixel, which is the marker of foreground. The binary image is obtained after image preprocessing, such as mathematical morphology, area threshold, and binarization. The effect is shown in Figure 2(a).

Process of the segmentation of overlapped and sheltered fruits: (a) binary image; (b) distance transform; (c) markers of the fruit; and (d) watershed algorithm.
In this article, it combines the local peak value of distance transform and watershed algorithm to achieve segmentation of overlapped fruits. Distance transform is used for the binary image, the effect is shown in Figure 2(b). Then the segmentation boundaries are obtained by watershed algorithm, the effect is shown in Figure 2(d). After morphological dilation and preprocessing operations, we can obtain the simply connected domain of single fruit, the effect is shown in Figure 3.

Process of obtaining the simply connected domain of single fruit: (a) morphological dilation and (b) simply connected domain.
As shown in Figure 3(b), the algorithm basically realizes the segmentation of overlapped fruits. However, due to disturbance of the uncertain factors, we obtained the non-actual contour of overlapped and sheltered fruits. Therefore, this article presents a new method to obtain the effective edge by eliminating invalid pixels and then reconstructs the actual contour of each fruit according to the effective edge. It will be discussed in more detail in later sections.
Extraction of effective edge
Because the shape of spherical fruit is close to ellipse in the machine vision image, therefore, the actual contour of the fruit can be reconstructed after the edges of similar circular arc (effective edge) are extracted. After obtaining the simply connected domain of single fruit, this article adopts Canny edge detector to extract the outer contour of simply connected domain, and the pixel coordinates set of outer contour of each fruit is obtained using eight-connected boundary tracking algorithm, the effect is shown in Figure 4. The pixel coordinates set belonging to the same object is represented by the same color.

Process of obtaining the pixel coordinates set of outer contour: (a) Canny edge detector and (b) eight-connected boundary tracking.
The method proposed in this article can eliminate the invalid pixels formed by factors like overlapping, occlusion, and uneven illumination and then reconstruct the actual contour of fruit with ellipse to locate each fruit according to the effective information. The algorithm includes the following steps: Acquiring the set Dividing the set into several groups Traversing the pixel coordinates set after grouping, according to the serial number Obtaining the distribution interval of all center coordinates Adopting eight-connected boundary tracking algorithm to detect the discrete contour edges after eliminating invalid pixels, and obtaining the pixel coordinates set of all discrete contours belonging to the same object; Recording the number of pixels contained in each discrete contour and then obtaining the edges of similar circular arc (effective edge) according to selection principle
where
As shown in Figure 5, it represents the distribution interval of the center coordinates

Distribution interval of all center coordinates and radii: (a) distribution interval of x 0; (b) distribution interval of y 0; and (c) distribution interval of r 0.

Process of obtaining the effective edge: (a) eliminating invalid pixels and (b) eliminating invalid edges.
where
Location of overlapped and sheltered fruits
The least square method is a mathematical optimization method. It can solve the appropriate fitting function of input data by minimizing the sum of error square, which can be used to obtain the unknowns from a known set of data. Therefore, according to the extracted effective edge information, we can reconstruct the actual contour with ellipse based on the least square method.
The general expression of ellipse can be described by the vector form
where
To ensure the effectiveness of the solution, the ellipse-specific constraint
where the design matrix
Constructing Lagrange function and calculating the partial derivative, and then the simplified function can be obtained as
where

Location result of overlapped and sheltered fruits: (a) pixels fitting and (b) reconstruction of fruit contour.
It should be noted that corrosion is carried out to smooth contour in the process of image preprocessing. Therefore, the axes of ellipse are set to 1.1 times of the calculated value. The subsequent images are processed in the same way.
Model reconstruction of overlapped and sheltered fruits
It is a crucial task to build the 3D model of fruits, which can provide effective size and spatial parameters for the mechanical arm and end effector. In this article, according to the result of image location, obtaining the corresponding spatial coordinates based on binocular camera, and then reconstructing the 3D model of each fruit based on the spatial coordinates.
In the space rectangular coordinate system, the spherical equation can be expressed as
Constructing the equation
Calculating the partial derivative, and then the simplified function can be obtained as
where
According to equations (14) to (17), we can obtain equation (18) as
The center coordinates
Results and analysis
We select 60 images of oranges captured by mobile phone in the orchard scene to test the performance of the algorithm. Sixty images of overlapped and sheltered fruits are divided into three groups, including 20 in direct sunlight condition, 20 in backlighting condition, and the others in uniform illumination condition. Three parameters including Segmentation Error (SE), Intersection Over Union (IOU), and False Negative Rate (FNR) are used to evaluate the performance of the algorithm in the condition of sunlight, backlighting, and uniform illumination. 20
1. SE represents the error rate of segmentation and is calculated by equation (19)
where
2. IOU represents the rate of pixels segmented correctly of fruits and background and is calculated by equation (20)
3. FNR represents the rate of pixels classified mistakenly of fruits and is calculated by equation (21)
To demonstrate the effect of the algorithm clearly, Figure 8 shows seven representative results of algorithm, and the calculation results of SE, IOU, and FNR parameters of 60 images are entirely shown in Figure 9 with curve values. Based on the test of 60 images, we come to the conclusion that the computation time of algorithm is related to the area of fruits in the image, and the average computation time of 60 images is 3.2 s. As shown in the Figure 8, due to uncertain factors such as sunlight, branches, and leaves, partial effective edge of the fruit is transformed into invalid edge, the location algorithm can extract the effective edge from the non-actual contour and then reconstruct the actual contour of overlapped and sheltered fruits with ellipse. It can be seen that the results are basically fitting the actual shape of the fruit.

Some representative experimental results: (a) original image; (b) eight-connected boundary tracking; (c) effective edges extraction; and (d) reconstruction of fruit contour.

Diagram of (a) SE, (b) IOU, and (c) FNR in direct sunlight, backlighting, and uniform illumination.
According to the test result, we can obtain that the average SE are 7.58%, 6.60%, 4.92%, the average IOU are 86.62%, 87.12%, 90.13%, and the average FNR are 7.81%, 6.71%, 5.66% in direct sunlight condition, backlighting condition, and uniform illumination condition. It can be seen that the contour reconstruction algorithm based on geometry is effective.
Fruit picking experiment
Experimental platform
The fruit picking experimental platform is shown in Figure 11. In the figure, the image acquisition equipment is Flea3 FL3-U3-20E4C camera (1600 × 1200) produced by Point Grey company (FLIR Systems, Wilsonville, Oregon, USA), and the camera lens is HS0814J produced by Myutron incorporation (Nishikoiwa, Edogawa-Ku, Tokyo, Japan). Image processing equipment is a portable computer with Intel(R) Core(TM) i7-8550U @1.80 GHz, 64 bit with 8 GB RAM. The algorithms are written in Python version 3.7. The radii of three fruits are between 20 mm and 30 mm, and the size of fruits are medium, small, and large from left to right. The motion mechanism is six-degree-of-freedom mechanical arm AUBO-i10 produced by AUBO company (Lianshihu West Road, Mentougou District, Beijing) and the flexible grasping manipulator developed by our group. The external structure of flexible grasping manipulator is shown in Figure 10.

Flexible grasping manipulator.

Experimental platform diagram.
The flexible grasping manipulator consists of three flexible fingers made of silica gels. It has the characteristics of continuous motion, large range deformation, and high flexibility. Therefore, it is generally believed that the uncertainty in the process of picking fruit can be compensated by the compliance of flexible grasping manipulator. What is more, the designed three-finger flexible grasping manipulator can provide enough grasping force while ensuring safe interaction to achieve grasping of the fruit target.
Stereo rectification is used to the original images according to the MATLAB calibration toolbox before spatial positioning, and then combining principle of binocular stereovision and stereo matching algorithm to obtain the 3D information of the center point of fruit in the camera coordinate system. It should be noted that the background area may be mistaken as the target area at the edge of the fruit and the failure of pixel matching in the process of obtaining spatial coordinates, which would obtain obvious noise data. We eliminate the noise data after obtaining the most concentrated interval according to the distribution interval of spatial coordinates.
In the fruit picking experiment, to convert 3D coordinates of the center point of fruit in the camera coordinate system into the coordinates in the manipulator coordinate system. We obtain the 3D coordinates in the camera coordinate system and manipulator coordinate system and then adopt the built-in function of OpenCV to solve the optimal 3D affine transformation matrix
Algorithm verification
After stereo rectification of original images, taking the image captured by left camera as an example, the process of image location is shown in Figure 12. It should be noted that the position of fruits is not overlapped because the picking planning algorithm of overlapped and sheltered fruits is not involved in this article, but the effectiveness of image location algorithm has been proved fully in previous chapters.

Process of image location: (a) left camera image; (b) normalized color difference; (c) eight-connected boundary tracking; (d) effective edges extraction; (e) pixels fitting; and (f) contour reconstruction of fruit.
We adopt binocular camera to obtain the disparity map according to the stereo matching algorithm and then obtain the spatial coordinates corresponding to image location. The disparity map is shown in Figure 13(a). Then the spatial coordinates after removing outliers are substituted into equation (18) to obtain the corresponding spherical center coordinates and radius. The effect is shown in Figure 13(b). Then we can reconstruct the 3D model of fruits using spheres in spatial location. The effect is shown in Figure 14.

Spatial location of fruits target: (a) disparity map and (b) the spatial coordinates after removing outliers.

Reconstruction of 3D model.
As shown in Figure 14, the coordinates of spherical center of reconstructed 3D model are (−328.49, 1395.43, 8.94), (−135.01, 1381.57, −88.88), (26.17, 1335.17, 118.82), and the radii are 24.31, 21.66, 27.18; the result of 3D model reconstruction basically matched the size and position of fruit in the actual scene. According to the result of spatial position and 3D model reconstruction, it can provide effective size and spatial information for the mechanical arm and flexible grasping manipulator. When the base of mechanical arm is fixed, it takes about 90 s for the mechanical arm and the flexible grasping manipulator to complete the picking task of three fruits in turn from the initial position. This article takes the large one fruit as an example and not involve the process of cutoff stalk. The diagram of fruit picking experiment is shown in Figure 15.

Process of fruit picking: (a) initial state; (b) contacting with the fruit; (c) picking the fruit; (d) task finished.
The fruit picking experiment is carried out in an ideal environment at present, however, in the real scenario, there are many factors will affect the task of the fruit picking such as the instability of mobile platform and natural self-movements of the fruit. These factors should be considered in the follow-up outdoor experiment. In terms of image processing, precision and fast detection of the 3D coordinates will be studied in the follow-up research, and in terms of actuator, the grasping accuracy and speed should be increased.
Conclusions
Location and model reconstruction of overlapped and sheltered fruits in unstructured natural environment is an essential prerequisite for fruit picking robot to achieve successful picking. In this article, on the basis of traditional image processing algorithm, an effective edge extraction method based on geometry is proposed to reconstruct the actual contour of fruit, and then reconstructing the 3D model of each fruit based on least square method. Sixty fruit images are used to verify the feasibility of the proposed algorithm, and the experimental results show that the average SE, IOU, and FNR are 6.36%, 87.9%, and 6.72%, respectively, in natural environment. The vision system of fruit picking robot is verified in laboratory environment, the spatial distribution matches the fruit position in the actual scene, and the end effector could implement picking task successfully. It indicated that the fruit location and model reconstruction algorithm have preferable performance, and it can be applied to the design of vision system of fruit picking robot, which have certain guiding significance and engineering application prospect.
Considering the research direction of the research group, and the limitations of image samples and hardware devices, the traditional image processing algorithms were adopted in our project at present. We believe that the effective edge extraction method based on geometry in this article provides a new idea for the contour reconstruction of overlapped and sheltered fruits in the natural environment. Even using machine learning algorithms to recognize the fruits, the method can also be used to reconstruct the actual contour of the fruits. We tried to use Fully convolutional network (FCN) model provided through Github to segment fruits and branches. The 216 fruits in 60 test images were tested, and we recognized 185 fruits successfully. Although the evaluation criteria of the two methods are not consistent, it also shows that the method based on machine learning has great potential.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
