Sage Journals: Discover world-class research

Abstract

Visual navigation is widely used in intelligent combine harvesters, but the existing algorithms do not have sufficiently high accuracy of the visual navigation line recognition under different sunlight conditions. To address this problem, this article proposes a sunlight-robust DeepLabV3+-based navigation line extraction method for combine harvesters. The navigation lines are extracted by constructing a new dataset and predicting the boundaries of the areas that have been and have not been cut. To address the problem that DeeplabV3+ is not sufficient light in the DCNN part, improvement is proposed by incorporating the MobileNetV2 module. In image segmentation, the prediction time is 22.5 ms, and the mean intersection over union (F_MIOU) is 0.79. After image segmentation, the navigation lines are drawn using the line segment detection algorithm for the harvester. The proposed method is compared with other mainstream networks, and the prediction results are compared using the line segment detection method. The results show that this method can more quickly identify the navigation lines under different conditions of sunlight with less labeled data than the improved U-Net and DeeplabV3+, which uses Xception as the backbone. Compared to the traditional method and the improved U-Net, this method achieves good results and improves the recognition speed by 27 and 9 ms, respectively.

Keywords

Combine harvester harvesting visual navigation image segmentation DeeplabV3+

Introduction

Intelligent combine harvesters have been widely used in recent years, which can greatly reduce the burden of operators and improve the accuracy and efficiency of operations.¹ However, the current application remains in the experimental stage and relies on experienced operators in the process. During this period, the intelligent harvester can only be supervised by humans in large fields, that is, the harvester cannot completely autonomously and independently perform work. Furthermore, the harvester cannot fully harvest the cutting width due to inevitable errors from the navigation system and steering system.² Thus, the accuracy cannot be guaranteed.

In field operations, the sensor that provides navigation for harvesters mostly is RTK-GNSS, RTK-GNSS has high accuracy and wide applicability, and it is currently the main navigation device in field harvesting. The positioning accuracy of RTK-GNSS sensors can be maintained at the centimeter level.^3,4 They can be mounted on most agricultural machines. They are relatively simple to operate, and their driverless operation in the field only requires the working range to be selected in advance on the device. However, they cannot describe the front situation such as obstacles, including rocks, poles and trees. The device is susceptible to interference from other electromagnetic devices,⁵ and the positioning accuracy can be affected by changing weather conditions.⁶

To improve the accuracy and better understand the surroundings, many researchers have loaded different vision sensors. Ma⁷ developed a control system that integrated satellite navigation and visual navigation using a monocular camera and a satellite module. Ai⁸ proposed an AGV navigation method using a stereo camera to identify the lane lines, obtain the positional deviation of the body relative to the lane lines, and track the path. Chen⁹ proposed a new method for the online self-motion estimation of combined harvesters using stereo cameras. However, stereo cameras produce a large amount of data, which is slightly time-consuming to process. There are also active light cameras, which can be classified into structured light cameras¹⁰ and time-of-flight (ToF) cameras.¹¹ These cameras directly obtain the depth information of an object relative to the camera by projecting visible light of a specific frequency, which is reflected by the surface of the object, with high accuracy. However, compared to stereo cameras, these cameras are more expensive and more sensitive to strong sunlight irradiation in a large field environment.

To better extract the navigation lines, some researchers have proposed the use of a 3D modeling-based approach, which can extract the boundary lines by identifying different heights of harvested and unharvested areas. Kneip¹² proposed a stereo vision setup to adaptively detect online the crop and cut edges. The algorithm uses a graphic processing unit (GPU) to quickly match camera images. This solution is inexpensive and less susceptible to dust interference than LIDAR. Zhang¹³ developed a machine-vision-based tip detection method that selected the Cr component of the YCbCr color model as the grayscale feature factor and automatically acquired the region of interest (ROI). In 2022, another method¹⁴ was proposed to extract the wheat harvesting navigation path using binocular cameras by extracting point clouds and using a polynomial fitting method. Traditional image processing methods are mostly based on the extraction of features such as color and texture and implement algorithms such as clustering and watershed segmentation, which require manual extraction of feature information.^15–17 Thus, traditional methods have low efficiency and generality, and the success rate of recognition in some large fields is low.

To improve the sunlight robustness and applicability of the system in a large field environment, this article proposes a stereo real-time navigation line extraction algorithm based on DeeplabV3+. Our method can extract navigation lines in strong sunlight compared to conventional methods, improves the recognition speed and accuracy compared to U-Net and improves the recognition speed with slightly reduced accuracy compared to DeeplabV3+, which uses Xception as the backbone.

This article makes the following contributions:

New combined harvester datasets are made available for conducting light robustness studies.

An enhanced DCNN module is proposed as an improvement to DeeplabV3+.This module effectively solves the effect of actual sunlight on image segmentation and improves the accuracy of the navigation line extraction. The image segmentation method identifies the cut and uncut areas and extracts the navigation lines based on the boundaries of the cut and uncut areas. This method can segment the navigation lines under strong light for the subsequent line segment detection algorithm to extract the navigation lines.

Our approach achieves speedup and good results compared to DeeplabV3+, which uses Xception as its backbone.

Materials and methods

System description

To ensure that the harvester moves along an accurate path during operation, an unmanned crawler rice combine harvester (model: Kubota EX108) is used the carrier of the field collection data platform. The cutting platform and roof shakes more violently while the harvester is moving, which will cause interference to the information captured by the camera. In addition, the camera should be installed as high as possible in the harvester to obtain a sufficiently wide field of view. Thus, the camera is mounted on the front of the harvester inside the cab, as shown in Figure 1, and equipped with a head for stability.

Figure 1.

System schematic.

A stereo camera fixes two identical cameras at definite positions, determines the position of the same subject in both left and right views from the captured images, and calculates the depth information based on the difference in position and geometry of the subject on the two views. In this study, the ZED2i camera is used, which can output images of the left and right views, IMU information, depth maps and 3D coordinates using the camera coordinates.

To ensure that the camera works properly in harvesting, the stereo camera acquisition system is powered by a ZED stereo camera (ZED2i, Stereo Labs Inc, San Francisco, CA, USA). Table 1 shows the main parameters of the camera, which can work in harsher environments than the previous model, and a laptop (Lenovo i5-8300H, Lenovo Inc, Beijing, China) with Nvidia CUDA for acceleration. Thus, images from the camera can be observed in real-time to easily make adjustments.

Table 1.

Main parameters of the ZED2i binocular camera.

Performance	Parameter
Resolution	2560 × 720
FPS	30
Baseline	12 cm
Interface	USB 3.0 Type-C

Detection of crop areas

The extraction of navigation lines mainly relies on distinguishing the boundaries of harvested and unharvested areas, as shown in Figure 2. The harvested area is mainly the rice stubbles after harvesting, and the unharvested area is mainly the rice plants to be harvested. Normally, one manually observes to make sure that the right cutting platform is in the same line relative to the boundary of the uncut area when the harvester performs the harvesting operation, precisely controls the left and right directions of the steering wheel, and attempts to make the harvester in the state of full cutting width. This manual operation can reduce missed cutting to improve the operation efficiency and reduce the operation cost. When a missed cut occurs, the harvester must run another round trip for harvesting, which will consume more fuel and seriously affect the harvesting efficiency.

Figure 2.

Schematic diagram of the cut and uncut areas.

During the automatic driving of the harvester, the navigation line can be generated in real-time; after the stereo camera acquires the image, the information is submitted to the computer. An algorithm uses the image output from the stereo camera to calculate the three-dimensional coordinates, and the computer passes the completed extraction of the navigation line to the controller to guide the harvester for efficient operation. The image acquired by a monocular camera usually can only obtain color and texture information, and it is difficult to obtain depth information, whereas a stereo camera can calculate the depth information. In this article, a stereo camera is employed to acquire image data from varying angles on both sides, which is then input into a neural network for the extraction of navigation lines. The whole system can be shown by Figure 3.

Figure 3.

The structure of the line extract system.

Dataset acquisition environment

In a large field environment, sunlight remarkably affects the effectiveness of identification, especially under strong sunlight. There is sometimes a significant change in brightness in the same field, which occurs when the harvester changes from facing the sunlight to moving away from the sunlight, especially at sunrise and sunset. To enhance the robustness of the recognition system, this dataset was collected under strong and normal light conditions in Nanjing and Changzhou, Jiangsu, on November 10, 2022, and November 25, 2022, respectively. In total, 2877 images were intercepted from the video streams in the dataset, from which 190 images were extracted for annotation by an experienced expert.

Data pre-processing

The data collected in the field have high resolution, and the processing is time-consuming. The combined harvester will operate according to the original path while losing time, which cannot achieve accurate control of the harvester, and there are also greater hidden dangers. The original picture contains irrelevant factors such as the sky and the paddle wheel. To reduce the computing time and improve the computing efficiency, the area of interest can be extracted. Compared with other scenes, the scene in the farmland is not variable, and the ROI does not need to be frequently adjusted. There should be as few irrelevant factors in the ROI as possible, as shown in Figure 4, the left side of the ROI is the uncut area, and the right side is the cut area. For consistency, the same area size of 204 × 115 in pixels was intercepted.

Figure 4.

Extraction of regions of interest.

Proposed method

The encoder of DeepLabv3+¹⁸ is a DCNN with Atrous convolution, which can adopt a common classification network such as ResNet.¹⁹ The output enters a spatial pyramid pooling module (ASPP) with Atrous convolution to introduce multi-scale information. The perceptual field can be arbitrarily expanded to extract as many features as possible. The multi-scale information is up-sampled four times after 1 × 1 convolution processing. Compared with DeepLabv3, DeepLabv3+ introduces the decoder module, which further fuses the underlying features with high-level features, compares the feature information in the image information by comparing two sources, and performs a 3 × 3 convolution and four times upsampling to obtain the output results, as shown in Figure 5.

Figure 5.

Schematic diagram of the DeeplabV3+ network model.

During the real-time recognition in the field, the size and inference time of the feature extraction network of the deep learning model are required. Choosing a lightweight network can reduce the number of parameters and inference time and improve the recognition speed. Thus, MobileNetV2 and Xception are used as the backbone networks in the DCNN part of the encoder to facilitate subsequent comparison. To better compare the recognition results, an improved U-Net²⁰ was also used for training. The model was tested for crop area segmentation by adjusting 2877 test images to 512 × 512 pixels to input to the model. Model training and testing were implemented in Python 3.9 using PyTorch 1.9.0. The CPU and GPU for image training were i7-12700KF and NVIDIA 3090Ti, respectively. NVIDIA 3090ti has 10752 CUDA cores and 24 GB of video memory, which can satisfy our proposed requirements.

The average precision rate P, average recall rate R, average intersection rate F_MIOU, and inference time are used to evaluate the image recognition classification results. They are calculated as follows:

\begin{matrix} P = \frac{T P}{(T P + F P)} \times 100 % \end{matrix}

(1)

\begin{matrix} R = \frac{T P}{(T P + F N)} \times 100 % \end{matrix}

(2)

\begin{matrix} F_{IOU} = \frac{T P}{(T P + F N + F_{P})} \times 100 % \end{matrix}

(3)

\begin{matrix} F_{MIOU} = \sum_{i = 1}^{n} \frac{F_{IOU} i}{n} \times 100 % \end{matrix}

(4)

P is the precision rate in percent, R is the recall rate in percent, TP is the number of correctly classified pixel points that are predicted to be correctly classified pixels, FP is the number of incorrectly classified pixel points that are predicted to be correctly classified pixels, FN is the number of correctly classified pixel points that are predicted as incorrectly classified pixel points, n is the number of categories in the classification, F_IOU is the mean intersection over union in percent and F_MIOU is the average intersection over union in percent.

Analysis and discussion

To evaluate the line recognition accuracy, this study evaluates the recognition effect in two dimensions. In image segmentation, the evaluation includes common image recognition parameters and inference time. In line detection, the evaluation metrics include the lateral error of the navigation line and the angular error of the navigation line. The lateral error is determined by measuring the ground truth and distance between the bottom pixels of the line detected using the Canny algorithm. The angular error is determined by measuring the ground truth and tangent line of the recognized curve at the bottommost part of the image at the angle errors to compare the errors.

Image segmentation evaluation

Two sets of images were randomly exported from the strong light environment and normal light environment, and each set of images has two views of left and right. Figure 6 shows the effect of the three network segmentations.

Figure 6.

Recognition effect of three different networks.

The unharvested area is shown in red, and the harvested area is shown in green. In this study, only the data of the harvester that operated on the straight line were processed, so the data when the harvester turned were not collected. Although the collected data were from the harvester moving on the same straight line, the areas on both sides were not invariable. All three methods are satisfactory because they can roughly distinguish the two sides, but there are differences in details. In strong light, both networks can style the two areas apart; in normal light, the network using U-Net for segmentation has red irregular areas in the separated area, whereas the DeeplabV3+ network makes the two models show different situations. DeeplabV3+ also yields relatively smoother boundaries than U-Net. In terms of details, MobileNetV2 identifies the background on the edge line in the data under normal sunlight.

The accuracy, recall, average intersection over union, and inference time of different models for recognition and classification were separately counted. The results are shown in Table 2.

Table 2.

Recognition classification effects of different networks.

Network type		Precision (%)	Recall (%)	mIoU (%)	Inference time (ms/frame)
U-Net		99.18	76.58	75.77	31.5
DeeplabV3+	MobileNetV2	98.80	80.08	79.22	22.5
DeeplabV3+	Xception	98.65	79.40	78.47	38.7

There is no significant gap among the models in terms of accuracy, recall and mean intersection over union. DeeplapV3+ with MobileNetv2 as the backbone has better recall and inference time than the other models but slightly lower accuracy than U-Net. Thus, both U-Net and DeepLabV3+ have strong feature extraction capabilities and can extract more detailed features. DeeplapV3+ with Xception as the backbone has the longest inference time, which is almost double that of MobileNetV2. DeeplapV3+ with MobileNetv2 has a significantly better inference time than the other two models: its inference time is 22.5 ms, which is almost 50% faster than that using the U-Net model. The reason is the lightweight structure of MobileNetV2, where fewer parameters imply less computation and less inference time. With this large reduction in inference time, It is considered acceptable to have a moderate decrease in accuracy. This effect is also inseparable from the ROI extraction, which will significantly reduce the prediction time.

Linear detection evaluation

Canny Edge Detection Algorithm is a popular method to detect straight lines in images and is composed of five parts: Gaussian filtering, pixel gradient calculation, non-maximal suppression, hysteresis thresholding and isolated weak edge suppression. In this article, the Canny edge detection algorithm²¹ is used to fit the edges to the recognized images.

Figure 7 shows the fitted results. The paths detected by the three networks are almost identical to the estimated paths and ground truth, although there are offsets and angle differences. The output image using U-Net shows obvious errors in the unharvested area, which reflects the weak ability of U-Net in boundary detection; the results are not sufficiently accurate in edge detection and the fitted straight line has a larger curvature than those of other methods. DeeplabV3+ performs the output image almost without obvious errors, but the segmentation result output using MobileNetv2 has an obvious black background in the two regions, so it can be easier to obtain smooth straight lines during straight line fitting, which is correlated with the network structure of DeeplabV3+.

Figure 7.

Extraction of network output results using Canny edge detection algorithm.

Straight line detection

The straight line detection uses the pixel offset at the bottom of the image as an indicator, which is measured using the image after Canny edge detection with the manually labeled image. Figure 8 shows the results. The time interval of each point on the graph is 23 frames as the sampling rate. Under normal sunlight, both MobileNet and Xception have errors of −10 to 10 pixels, while the errors of U-Net fluctuate between − 30 and 20. In the U-Net output, harvested regions are divided into unharvested regions, and the estimated edges shift from the actual edges. In the middle of strong light (frame 570) and in normal light (frame 691), there are places with a larger error due to the difficulty in determining the direction with interference factors in the middle region. In most frames, MobileNet and Xception output similar deviations, whereas the U-Net outputs have the largest errors.

Figure 8.

Pixel errors of different networks under strong and normal sunlight: (a) normal sunlight; (b) strong sunlight.

Angle detection

Calculating the heading deviation involves determining the angle between the predicted straight line and the ground truth. This information is used to adjust the path of the harvester and ensure that it stays on track, as shown in Figure 9. The time interval for each point on the graph is also 23 frames. From start to finish in both bright and normal light, the vast majority of lateral errors for MobileNet and Xception is between −10° and 10°, whereas U-Net shows a range of approximately −20° to 20° with additional angular deviations over 40° in the positive direction. In most frames, MobileNet and Xception have similar errors, and U-Net has the largest errors.

Figure 9.

Angular errors of different networks under strong and normal sunlight: (a) normal sunlight; (b) strong sunlight.

Table 3 compares the performance of the segmentation methods using statistical analysis. When the U-Net network is used for inference, the mean angular error is 5.18°, the maximum angular error is 43.465°, the coefficient of variation is 0.77, the mean lateral error is 2.81 pixels, the maximum lateral deviation is 23 pixels and the coefficient of variation is 1.06 under strong light. Under normal light, the mean heading error is 4.74°, the maximum heading error has a coefficient of variation of 1.01, the mean lateral error is 3.81 pixels, and the maximum lateral deviation is 27 pixels with a coefficient of variation of 1.03. When the MobilenetV2 network is used for extrapolation, under strong light, the mean heading error is 3.14°, the maximum heading error is 14.093° with a coefficient of variation of 0.64, the average lateral error is 0.66 pixels, the maximum lateral deviation is 10 pixels, and the coefficient of variation is 0.98. Under normal light, the average heading error is 3.33°, the maximum heading error is 15.471°, the coefficient of variation is 0.69, the average lateral error is 1.06 pixels, the maximum lateral deviation is 8 pixels, and the coefficient of variation is 0.68. When the Xception network is used for speculation, under strong light, the average heading error is 4.30°, the maximum heading error is 18.594°, the coefficient of variation is 0.57, the average lateral error is 0.93 pixels, the maximum lateral deviation is 15 pixels and the coefficient of variation is 0.88. Under normal light, the average heading error is 3.95°, the maximum heading error is 26.584°, the coefficient of variation is 0.79, the average lateral error is 1.32 pixels, the maximum lateral deviation is 12 pixels and the coefficient of variation is 0.72.

Table 3.

Performance of different networks under strong and normal light.

	Network type		Mean value of heading error (°)	Standard deviation of heading error	Coefficient of variation of heading error	Heading error maximum (°)	Average value of lateral deviation (in pixels)	Standard deviation of lateral deviation	Coefficient of variation of lateral deviation	Lateral deviation maximum (pixels)
Strong light	U-Net		5.18	7.40	0.77	43.465	2.81	4.59	1.06	23
	DeeplabV3+	MobileNetV2	3.14	3.15	0.64	14.093	0.66	2.33	0.98	10
	DeeplabV3+	Xception	4.30	4.02	0.57	18.594	0.93	2.85	0.88	15
Normal light	U-Net		4.74	8.34	1.01	46.635	3.81	5.17	1.03	27
	DeeplabV3+	MobileNetV2	3.33	2.90	0.69	15.471	1.06	1.54	0.68	8
	DeeplabV3+	Xception	3.95°	5.44	0.79	26.584°	1.32	2.55	0.72	12

The results show that MobilenetV2 is the best option among these three models, except that it has a higher coefficient of variation under strong light than Xception. A possible reason is that a small part of the black background is reserved in the recognition, that is, a part of the area is reserved for straight line fitting, which leaves a place for the straight line and decreases the offset in the calculation. This result also reflects that DeeplabV3+ is better than U-Net in edge segmentation.

Discussion

In light of the comprehensive results presented in this study, it becomes evident that the proposed method offers substantial promise for the precise segmentation of harvested and unharvested areas, particularly in the context of repetitive and similar scenes during harvesting, typically characterized by the presence of three distinct classes. The method has proven to be both effective and efficient in area segmentation, a crucial consideration for optimizing agricultural operations.

When evaluating the performance of the proposed method, we find it to be on par with, if not surpassing, previous approaches. Notably, the segmentation results achieved by our proposed approach closely align with those of earlier methods. It is worth highlighting that previous methods often exhibited processing speeds exceeding 50 ms per frame,^11,12,22 a drawback that limited their real-world applicability, particularly under strong sunlight conditions.

In contrast, our method stands out by demonstrating notable advancements in practical performance. This can be attributed to the efficient architectural design of DeeplabV3+, which significantly reduces model parameters while augmenting the dataset's capacity to handle strong sunlight scenarios.

While these results hold promise for agricultural navigation, two limitations should be noted. Further dataset enrichment is needed to accommodate variations in rice maturity levels. Additionally, evaluation based solely on pixel-wise errors should be supplemented with field data to validate real-world performance. Future research may explore network optimization for increased efficiency.

Conclusions

In this study, an innovative method for boundary identification during the harvesting process is introduced. The approach comprises multiple key components, including the extraction of ROI, the application of advanced deep learning techniques via DeepLabV3+ for the segmentation of harvested and unharvested areas, and the fitting of boundary lines using the Canny edge detection algorithm. The results obtained from this study demonstrate a notable enhancement in identification performance, particularly under challenging high-intensity sunlight conditions.

MobileNetV2 has been employed as the backbone network for feature extraction, surpassing alternative choices in terms of efficiency. The synergistic combination of Canny edge detection for boundary extraction and MobileNetV2 for recognition significantly enhances the overall effectiveness of the proposed method.

One of the most compelling aspects of this approach is its real-time detection capability, with a remarkable inference time of just 22.5 ms, all while maintaining a resolution of 204 × 115. This achievement aligns perfectly with the specific requirements of field operations, showcasing its practicality in real-world settings.

The technological advancements presented in this study have the potential to revolutionize agricultural practices by vastly improving operational efficiency, minimizing waste, and boosting crop yields.

Footnotes

Authors’ contributions

Gong Cheng contributed to writing-original draft preparation and methodology. Chengqian Jin contributed to writing-review and editing. Man Chen contributed to validation and methodology. All authors have read and agreed to the published version of the manuscript.

Declaration of Competing Interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by National key research and development plan project (2021YFD2000503) and the National Natural Science Foundation of China (No.32171911, No.32272004), General Project of Jiangsu Natural Science Foundation (BK20221188) and Jiangsu Agriculture Science and Technology Innovation Fund (No. CX(20)1007).

ORCID iDs

Gong Cheng

Man Chen

Author biographies

Gong Cheng received the B.E. degree in mechanical engineering from Nanjing Institute of Technology, Nanjing, China, in 2021. He is currently pursuing the M.Sc. degree in agricultural mechanization engineering with the Chinese Academy of Agricultural Sciences, Beijing, China. His research interests include machine vision and agricultural robots.

Chengqian Jin received the B.Sc. degree in agricultural mechanization engineering from Huazhong Agricultural University, Hubei, China, in 1995, the M.Sc. degree in agricultural mechanization engineering from the Nanjing University of Science and Technology, Nanjing, China, in 2006, and the PhD degree in agricultural mechanization engineering from Nanjing Agricultural University, Nanjing, in 2014. He is currently a Researcher with the Chinese Academy of Agricultural Sciences and a Professor with the Shandong University of Technology. His major is in agricultural mechanization engineering.

Man Chen received the B.E. degree in electronic information science and technology from Nanjing Agricultural University, Nanjing, China, in 2011, and the PhD degree in agriculture electrification and automation from Nanjing Agricultural University, Nanjing, in 2016. He is currently an associate researcher with the Chinese Academy of Agricultural Sciences. His major is in agricultural electrification and automation engineering.

References

Zhang

, et al. Research progress of agricultural machinery navigation technology. Transactions of the Chinese Society for Agricultural Machinery 2020; 51: 18.

Zhang

, et al. Automated detection of boundary line in paddy field using MobileV2-UNet and RANSAC. Comput Electron Agric 2022; 194: 106697.

Gan-Mor

Clark

Upchurch

. Implement lateral position accuracy under RTK-GPS tractor guidance. Comput Electron Agric 2007; 59: 31–38.

Wang

, et al. Path tracking control method and performance test based on agricultural machinery pose correction. Comput Electron Agric 2022; 200: 107185.

Morales-Ferre

Richter

Falletti

, et al. A survey on coping with intentional interference in satellite navigation for manned and unmanned aircraft. IEEE Commun Surv Tutorials 2020; 22: 249–291.

Nowakowski

Dudek

Rosiński

. The influence of varying atmospheric and space weather conditions on the accuracy of position determination. Sensors 2023; 23: 2814.

Yin

, et al. Rice row tracking control of crawler tractor based on the satellite and visual integrated navigation. Comput Electron Agric 2022; 197: 106935.

Geng

, et al. Research on AGV navigation system based on binocular vision. In: 2021 IEEE international conference on real-time computing and robotics (RCAR), 15–19 July 2021, pp.851–856.

Chen

Guan

, et al. Stereovision-based ego-motion estimation for combine harvesters. Sensors 2022; 22: 6394.

10.

Anwar

Lee

. High performance stand-alone structured light 3D camera for smart manipulators. In: 2017 14th international conference on ubiquitous robots and ambient intelligence (URAI), 28 June–1 July 2017, pp.192–195.

11.

Gai

Xiang

Tang

. Using a depth camera for crop row detection and mapping for under-canopy navigation of agricultural robotic vehicle. Comput Electron Agric 2021; 188: 106301.

12.

Kneip

Fleischmann

Berns

. Crop edge detection based on stereo vision. Rob Auton Syst 2020; 123: 103323.

13.

Zhang

Cao

Peng

, et al. Cut-edge detection method for rice harvesting based on machine vision. Agronomy 2020; 10: 590.

14.

Zhang

Cao

, et al. Cut-edge detection method for wheat harvesting based on stereo vision. Comput Electron Agric 2022; 197: 106910.

15.

Liu

, et al. A method of extracting navigation line of inter row robot based on monocular vision. Zhongguo Guanxing Jishu Xuebao/J Chin Inert Technol 2022; 30: 777–782, 790.

16.

Pan

Chen

Xie

, et al. Detection of the wheat-harvesting boundary in wheat field based on multi-texture fusion. Nongye Gongcheng Xuebao/Trans Chin Soc Agric Eng 2023; 39: 123.

17.

Zeng

Lei

Tao

, et al. Navigation line extraction method for combine harvester under low contrast conditions. Nongye Gongcheng Xuebao/Trans Chin Soc Agric Eng 2020; 36: 18–25.

18.

Chen

L-C

Zhu

Papandreou

, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari

Hebert

Sminchisescu

, et al. (eds) Computer vision – ECCV 2018. Cham: Springer International Publishing, 2018, pp.833–851.

19.

Zhang

Ren

, et al. Deep residual learning for image recognition. IEEE 2016.

20.

Jin

Liu

Chen

, et al. Online quality detection of machine-harvested soybean based on improved U-Net network. Trans Chin Soc Agric Eng 2022; 38: 70–80.

21.

Canny

. A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 1986; PAMI-8: 679–698.

22.

Kim

G-H

Seo

Kim

K-C

, et al. Tillage boundary detection based on RGB imagery classification for an autonomous tractor. Korean J Agric Sci 2020; 47: 205–217.

DeeplabV3+-based navigation line extraction for the sunlight robust combine harvester

Abstract

Keywords

Introduction

Materials and methods

System description

Detection of crop areas

Dataset acquisition environment

Data pre-processing

Proposed method

Analysis and discussion

Image segmentation evaluation

Linear detection evaluation

Straight line detection

Angle detection

Discussion

Conclusions

Footnotes

Authors’ contributions

Declaration of Competing Interest

Funding

ORCID iDs

Author biographies

References