Sage Journals: Discover world-class research

Abstract

The wilting of leaves caused by disease poses risks to both harvest yield and the environment. Therefore, the timely detection of disease signs on leaves is crucial to enable farmers to prevent disease outbreaks and safeguard their crops. However, manually observing all diseased leaves on a large scale demands substantial time and human effort. In this study, we propose an effective method for automated disease detection on leaves. Specifically, this method utilizes images captured from mobile phones. The proposed technique combines four models (ensemble of models) with distinct features: (1) ResNeXt50 model with a high-quality image processing, (2) ViT model with a low-quality image processing, (3) Efficientnet B5 model combines a self-learning with noisy input, and (4) Mobilenet V3 model with image segmentation. Experimental results demonstrate that the proposed method outperforms some of the state-of-the-art methods on TLU-Leaf dataset (ours) with F1-score of 90% and Cassava Leaf Disease dataset with F1-score of 87%.

Keywords

Convolutional neural network deep learning multiple-model leaf disease classification

1 Introduction

Countries with monsoon climates have a diversity of organisms and fertile soils, which offer advantages in terms of agricultural development. However, the monsoon climate is also perfect for fungi, bacteria, and viruses to grow. They cause various plant diseases such as leaf rollers, and yellow leaves every year. These diseases can spread quickly, affecting large areas of crops without prediction and control. As a result, farmers can lose a lot of agricultural products. Furthermore, some pests also pose a threat to the environment. We need to construct classification models, data processing pipelines, and machine learning models [1] to predict the presence of plant diseases in advance.

Therefore, it is necessary to detect leaf diseases and prevent unwanted consequences. In addition, appropriate measures to detect pests [25] and diseases can help to map the locations of crops affected by pests and diseases in mountainous areas to provide timely information to local authorities and farmers.

Detecting foliar diseases poses a significant challenge due to the intricate nature of visual imagery [3]. This complexity has prompted a heightened demand for intricate and specialized analysis of foliar indicators. Our objective is to leverage a diverse set of data features for pest detection, given their effectiveness, precision, accessibility, and cost-efficiency. Particularly in expansive fields characterized by dense crop populations, employing close-range equipment offers heightened accuracy advantages. However, this approach is resource-intensive in terms of labor, expenses, and installation time. Conversely, a composite feature approach can be readily deployed and operated across extensive regions. Furthermore, outcomes obtained through multi-model utilization of images show promising potential. In this study, we propose a strategy founded on a multi-model framework and an intermediate anomaly extraction module for analyzing images of infected trees. The proposed approach boasts a lightweight nature, rendering it suitable for implementation on low-cost embedded boards.

Our contributions are three folds and are summarized as follows:

Novel methodology: We explore a multiple-model approach for the early detection of leaf diseases

New dataset: We introduce a self-collected dataset focused on leaf-related diseases, named TLU-Leaf.

Analysis and evaluation: We evaluate our multi-model approach on two leaf disease datasets: one is a publicly available dataset, and the other is a privately collected dataset (assembled by our own team).

The remainder of this paper is structured as follows. Section 2 discusses relevant previous studies. Section 3 presents our method. The experimental evaluation is shown in Section 4. Finally, some concluding remarks and a brief discussion are provided in Section 5.

2 Related works

Our proposed method in this article encompasses five research domains: deep learning models (DL) with high-quality images, DL with low-quality images, DL with self-supervised learning incorporating noisy inputs, and DL with image segmentation, and finally, we research to propose a framework that fuse deep learning models to fully solving the four aspects of the task we are posing.

DL with high-quality images. The preservation of high-quality image content mandates the rectification of noise and undesired artifacts. JPEG, a prevalent image format, employs an irreversible compression scheme to reduce file sizes. When compression artifacts manifest on images, they necessitate elimination prior to image processing. Cavigelli et al. [4] engineered a tool for rectifying JPEG compression artifacts through a 12-layer DCNN architecture. Their approach involves processing JPEG images with varying compression ratios, resulting in a notable enhancement in Signal-to-Noise Ratio (SNR), up to 1.79 dB. Sun et al. [5] conceived an expansive Deep Neural Network for Compression Artifacts Correction (DnNCC) by amalgamating multiple pre-existing DCNNs, tailored to cleanse JPEG-compressed images. Their work demonstrated that despite the scale of the DnNCC, rapid learning occurs, facilitated by an advanced learning algorithm that contributes to the robust removal of artifacts. Techniques designed to mitigate such artifacts bear the potential to mitigate spurious phenomena, such as motion artifacts, often encountered in diagnostic imagery. Super-resolution techniques have been employed in plant disease image analysis. By applying such methods to voluminous images, virtual thin-slice images can be generated. The transformation of volumetric leaf disease images into high-fidelity thin-slice representations via post-processing can substantially reduce acquisition time. Barisoni et al. [6] formulated an approach to enhance the resolution of diseased plant images using deep learning. They employed conventional diseased plant images for training and trained a Super-Resolution Convolutional Neural Network (SRCNN) using low-resolution images obtained through down-sampling. A comparison against traditional image upscaling techniques, such as nearest neighbor and linear interpolation, with images processed by SRCNN, underscores the latter’s capacity to deliver less blurred and more authentic images. To investigate edge responses at distinct convolutional kernel scales, they constructed a stacked ensemble comprising Residual Next-50 (ResNext-50) [7] deep learning models, interconnected by a Multi-Classification Fusion Unit (MCFU), facilitating the detection of diseased plants.

DL with low-quality images. Deep learning is an evolving paradigm in machine learning that has consistently demonstrated exceptional outcomes across various research domains, including computer vision, low-quality images and classification, bioinformatics, and more [8 –10]. Recent investigations have underscored the success of deep learning-based approaches, primarily leveraging convolutional neural networks (CNNs) [12], in addressing image processing problems. This success is rooted in their adeptness at efficiently extracting features from data [11], surpassing the capabilities of classical techniques like Bag-of-Words [13]. The rapid advancement of computer vision technology has resulted in the widespread utilization of imaging computational methodologies for leaf disease identification [14]. Among the prevailing methods, deep networks stand out due to their excellent performance, serving as high-capacity classifiers that demand substantial data for effective network training. CNNs boast crucial attributes, including invariance and stability against scale, distortion, noise, and displacement [15]. Employing deep learning classifiers holds significant potential for enhancing accuracy [16]. Present research trajectories indicate that the automated recognition of plant diseases and efficient disease diagnosis represent paramount challenges within the realm of agriculture [17]. The aspect of image quality often remains overlooked in the development of computer vision models. While most AI models for image recognition or computer vision are trained and evaluated using datasets containing high-quality images, real-world applications introduce complexities in image quality. In these practical scenarios, high-quality inputs cannot always be anticipated. Furthermore, previous studies involving neural network applications have achieved remarkable outcomes predominantly with high-quality datasets, yet real-world applications grapple with low-quality images affected by factors such as lower resolution, noise, or limited dynamic range [18]. Consequently, the classification of low-quality images or adversarial patterns in worst-case scenarios emerges as an intriguing and challenging research pursuit [19, 20].

DL with self-supervised learning incorporating noisy inputs. Semi-supervised Learning (SSL) [21] represents a prominent paradigm in machine learning that facilitates effective learning with limited labeled data and abundant unlabeled data. SSL has been extensively investigated within the realm of image classification [22 –26]. These investigations can be broadly categorized into two principal groups: pseudo-labeling and enforcing consistency. Pseudo-labeling involves generating artificial labels using a teacher model for unlabeled data, subsequently utilizing these labels to further train a student model. [27] proposed training a student model using a pre-trained fixed teacher model. Subsequent inquiries delve into the utilization of pseudo-labels. For instance, [28] introduced NoisyStudent, training a student model iteratively with a teacher model; [29] proposed FixMatch, incorporating various augmentation levels to create and predict pseudo-labels; [30] updates the teacher model based on feedback from the student model. Principal methods of enforcing consistency encourage models to yield consistent output from noisy input data, and these methods have gained widespread adoption within SSL. [31, 32] updates the teacher model by computing the weighted average of student model weights at each training step, thus enhancing label quality. Most closely aligned with our work are two prior methodologies exploring semi-supervised temporal action localization (SSTAL). [33] designs two sequentially noisy frames based on the average teacher framework, [32] introduces the integration of self-supervised learning into SSTAL. However, our undertaking involves a significantly more intricate linguistic foundation compared to action localization, necessitating a multifaceted joint learning approach.

DL with image segmentation. Numerous image segmentation algorithms for leaf disease have been developed in the literature, ranging from early methods such as thresholding [34], region-based techniques [35], region growing [36], k-means clustering [37], watershed transform [38], to more advanced algorithms like active contour [39], graph cut [40], and sparse-based methods [41]. CNN for leaf disease classification was initially proposed by Ngugi [42]. Newlin et al. [43] further developed a practical CNN architecture for leaf disease classification. CNN typically comprises three types of layers: i) convolutional layers, where weighted kernels (or filters) are convolved to extract features; ii) non-linear layers, applying (usually element-wise) activation functions to feature maps, thereby enabling the network to model non-linear functions; and iii) pooling layers, reducing spatial resolution by replacing small neighboring regions in feature maps with some statistical information about those regions (average, maximum, etc.). Neurons in layers are locally connected, meaning each neuron receives input weighted from a small neighborhood, termed a receptive field, of neurons in the previous layer. By stacking layers to form pyramids with increasing resolutions, higher layers learn features from increasingly wider receptive fields. The main computational advantage of CNN lies in the shared weights among all receptive fields in a layer, resulting in significantly fewer parameters compared to fully connected neural networks. Some of the most notable CNN architectures with image segmentation include Mobilenet [42], AlexNet [44], VGGNet [45], and ResNet [46].

Ensemble learning. Ensemble learning is a method of training different models and then combining the outputs in some way to achieve state-of-the-art performance. [47] surveyed generative learning methods in machine learning and presented their concepts, algorithms, applications, and prospects. It can be seen that the idea of combining multiple machine learning models to solve a task has been formed for a long time and experimented with many different studies. With the advent and robust development of deep learning in solving problems related to natural language processing, computer vision, and time series forecasting, there is again the trend of deep ensemble models [48 –50]. [48] shows four cases of combining deep learning models (Figure 1): (A) Applying many different basic models using the same data. (B) Applying different structures of the same basic model using the same data. (C) Applying many different basic models using many different data samples. (D) Applying different structures of the same basic model using many different data samples.

Fig. 1

Four cases of ensemble deep learning.

3 Material and methods

3.1 MnLeaf framework

In this study, we present a sophisticated multiple-model neural network aimed at effectively classifying various leaf diseases, named MnLeaf framework. Our proposed methodology entails the integration of four distinct neural network architectures. Specifically, we employ the ResNext50 architecture to facilitate precise leaf disease detection in scenarios involving high-quality images. In cases where image quality is relatively lower, we deploy the Vision Transformer (ViT) architecture to ensure accurate disease classification. To address the challenges posed by noisy images, we introduce the EfficiencyNetB5 architecture, which serves as both a teacher and a student model. Additionally, we leverage the capabilities of the MobilenetV3 architecture to extract pertinent segmentation features, as visually depicted in Figure 2. Through this comprehensive approach, our study contributes to the advancement of leaf disease classification using cutting-edge neural network architectures and carefully curated datasets.

Fig. 2

Overall architecture of the proposed method.

The architecture of the multi-channel deep learning model is proposed to combine ResNeXt50, EfficientNet B5, and MobileNet V3 for feature extraction, and ViT layers for sequence prediction in four channels, as fully connected as illustrated in Figure 2. Each channel or head consists of a stack of Conv1D layers, followed by max-pooling layers. The Conv1D layers, configured with 128 filters, directly map and abstract the sensor inputs to extract features. Feature extraction is achieved through convolutional operators applied to kernels, and feature maps are computed as described in Table 3 below. To enable feature extraction at different resolutions, we employed two layers, ResNeXt50 and ViT, for the Conv1D layers of corresponding channels. The outputs from each Conv1D layer are then fed into max-pooling layers, reducing the size of learned features by summarizing them into separate elements without compromising accuracy. The EfficientNet B5 layer has an output configuration of 128 units for the Conv1D layers and max-pooling layers. The advantage of this is that the EfficientNet B5 layer is well-suited for learning the input image noise, leading to improved image classification. This model helps minimize overfitting and enhances model accuracy. Finally, the MobileNet V3 layer is employed to learn segmentation features for identifying the objects to be classified, in this case, leaves and leaf diseases. The outputs from the four channels are flattened and then concatenated within the model. They pass through three fully connected layers, capable of generating interpretable features into different classes. The final output of the model is from a dense layer with a softmax activation function to compute the probability distribution over the predicted operational classes.

In the next section, we will present in detail the used features and the architecture of the four networks.

3.2 Pre-processing Data

To streamline and enhance the quality of input records, we performed three preprocessing steps (show in Figure 2), including data dimension reduction, adding noise to the data, and generating segmentation images. Firstly, we applied several preprocessing techniques to reduce the data dimension. For instance, we employed PyTorch functions to reduce the image size from 512×512 to 224×224 while retaining essential features. Additionally, we utilized downsampling techniques in convolutional neural networks (CNNs) to further reduce the data dimension.

After reducing the data dimension, we introduced image noise using the GrandInversion [63] technique to add noise. GrandInversion utilizes ResNet-50 as a backbone to introduce random noise and generate a natural-looking image. GrandInversion reconstructs the input from gradients as part of an optimization process. The addition of noise results in two images: (i) an origin image; and (ii) an image with noise.

Finally, we carried out the generation of segmentation images to isolate individual objects, with a focus on objects affected by diseases, such as plant leaves. We employed a segmentation technique [64] developed by Facebook’s research team. This technique effectively separates all objects within the image, after which we processed the data to select only the diseased plant leaves and classify them accordingly. The primary objective of our segmentation approach is to generate binary masks for each leaf of the plant, for each type of disease on the leaves, and associate it with a specific plant based on real agricultural field images. Therefore, we perform simultaneous segmentation of individual leaf samples and plants. This enables us to determine the shape and size of each leaf, as well as the condition of disease on the leaves, which is highly suitable for conducting morphological analysis.

3.3 ResNeXt50

The conventional approach to enhance model accuracy involves deepening or expanding the network architecture. However, as the number of hyperparameters (such as channels, filter sizes, etc.) increases, challenges in network design and computational costs also escalate. Building upon the foundation of ResNet, ResNeXt integrates the block-stacking strategy of ResNet with the grouped convolutions of the Inception structure. This amalgamation creates a synthesis approach that elevates the accuracy of the recognition model without amplifying its complexity. Through this strategy, ResNeXt achieves improved classification efficacy by employing a simplified topology, eliminating the need for additional parameters. Experimental results demonstrate that ResNeXt can achieve superior classification performance while utilizing a more straightforward architecture, thereby avoiding the necessity for parameter augmentation. Table 1 presents the detailed architecture of the proposed network.

Table 1
Detailed ResNeXt50 network to produce the first modality

stage output ResNeXt-50

conv1 256 x 256 7 x 7, 64, stride 2

sconv2 3 x 3 max pool, stride 2

112 x 112 $[\begin{matrix} 1 \times 1, & 128 \\ 3 \times 3, & 128, & C = 32 \\ 1 \times 1, & 256 \end{matrix}] \times 3$

conv3 56 x 56 $[\begin{matrix} 1 \times 1, & 256 \\ 3 \times 3, & 256, & C = 32 \\ 1 \times 1, & 512 \end{matrix}] \times 4$

conv4 28 x 28 $[\begin{matrix} 1 \times 1, & 512 \\ 3 \times 3, & 512, & C = 32 \\ 1 \times 1, & 1024 \end{matrix}] \times 6$

conv5 14 x 14 $[\begin{matrix} 1 \times 1, & 1024 \\ 3 \times 3, & 1024, & C = 32 \\ 1 \times 1, & 2048 \end{matrix}] \times 3$

1x1 global average pool 128 fc

# params. 25.0 x 10⁶

3.4 ViT

The proposed architecture in Table 2, referred to as ViT, is specifically devised for the purpose of classifying objects with correlations between similar classes and dissimilar classes. While global classification can be achieved through relatively simple global cues, achieving finer classification necessitates highly localized discriminative regions. In this study, to capture these subtle distinguishing features, prototypes have been applied to condensed image patches. This approach aims to differentiate between categories at a more detailed level, allowing for more precise classification by focusing on localized features.

Table 2
Detailed ViT network to produce the second modality

input size structure ViT

Input 224 x 224 stem conv, 1 x 1, 64, max, S2

stage conv, 1 ×1, 64conv, 3 ×3, 64, S2conv, 1 ×1, 256 ] × 3

transition conv, 1 x 1, p

backbone 197 x 1 [linear project, p/h similarity matrix, softmax linear combination fully conntected, p′ fully conntected, p ] × L

1 x 1 class token full connected, 128

The scaled dot-product attention serves as a pivotal element within the Multi-Head Self Attention layer (MHSA) [51] of the Transformer architecture. In the MHSA, a set of queries $Q \in ℝ^{N \times d}$ , keys $K \in ℝ^{N \times d}$ , and values $V \in ℝ^{N \times d}$ is first generated through appropriate projections. Subsequently, the query vector $q \in ℝ^{d}$ is compared to each individual key vector within K. The resultant output is determined by the weighted summation of a collection of N value vectors v, with the weighting guided by the similarity scores derived from this matching process. This mechanism is commonly referred to as the scaled dot-product attention mechanism:

$Attention (Q, K, V) = Softmax (\frac{{QK}^{T}}{\sqrt{d}}) V$ (1)

To mitigate the issue of exceedingly small gradients and enhance training stability, every element within the QK^T matrix is multiplied by the constant factor $\frac{1}{\sqrt{d}}$ , which effectively rescales the values into a more standardized range.

3.5 EfficientNetB5

We synergistically combined the method of proportional scaling to optimize parameter efficiency and training speed for EfficientNetB5 models. In addition, we integrated the concept of Noisy Student training to implement a semi-supervised learning approach that capitalizes on self-training and knowledge distillation principles, as presented in Fig. 2. This involves training a teacher model on labeled images to produce pseudo labels for unlabeled images. Subsequently, a student model is trained using a blend of labeled and pseudo-labeled images. Notably, this process employs student models of equal or larger size and introduces controlled noise to the student during the learning phase, resulting in further performance enhancement.

3.6 MobilenetV3

In this section, we introduced the utilization of MobileNetV3 as the foundational network architecture for the mobile semantic segmentation task. Concurrently, we employed specialized segmentation heads known as R-ASPP, as initially proposed in [xxx]. R-ASPP represents a streamlined adaptation of the Atrous Spatial Pyramid Pooling module [xxx], focusing on two distinct branches encompassing a 1 × 1 convolution and a global-average pooling operation. To harness richer features, we implemented atrous convolution within the final block of MobileNetV3. Additionally, we integrated a skip connection from low-level features to ensure the incorporation of finer-grained details.

3.7 Fusion

Given an input image, after passing through four models, we obtain a predictive label that amalgamates distinctive feature characteristics from each deep learning model. Tailoring the fusion process according to the method we propose is expected to enhance the accuracy in classifying diseased tree leaves. We have synthesized these processes into an algorithm outlined in Table 3.

Table 3
Fusion model of four features

Step Processing

Input Image x; Predict Label y′

1 Calculate m₁ = ResNeXt50 (x)

2 Calculate m₂ = ViT (x)

3 Calculate m_RV = mean (m₁, m₂)

4 Calculate m₃ = EfficientNetB5 (x)

5 Calculate m_seg = InputSegmentation (x)

6 Calculate m₄ = MobilenetV3 (m_seg)

7 Calculate y′ = sum (m_RV, m₃, m₄)

Finally, our model has gained significant improvements in performance compared to the previous version at almost no computational cost. In detail, the new non-linearity activation called h-swish has fixed some weakness of its base by replacing the computationally expensive sigmoid with a piecewise linear analogue (RELU6):

$swish (x) = x \cdot σ (x)$ (2) $h - swish (x) = x \frac{RELU 6 (x + 3)}{6}$ (3)

4 Experiments

4.1 Dataset

The model is trained and evaluated using expansive datasets: the Cassava Leaf Disease Classification dataset, comprising 15,000 images, and the TLU Leaf Disease dataset (show in Fig. 3), comprising 2,500 images. The TLU Leaf Disease dataset comprises 5 labels (bamboo leaf disease, verticillium wilt, leaf scorch, turning yellow, plant wilting), each containing 500 images. In Figure 3, each row corresponds to 3 sample images of the same label. Both datasets encompass five distinct labels corresponding to different diseases.

Fig. 3

Samples images from the own collected dataset (TLU-Leaf).

4.2 Performance metrics

The experimental outcomes were evaluated utilizing metrics including accuracy, F1 score, precision, recall, Kappa, and the confusion matrix. These metrics can be computed as follows:

$precision = \frac{TP}{TP + FP}$ (4)

$recall = \frac{TP}{TP + FN}$ (5)

$accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (6)

$F_{1} = 2 \times \frac{precision \times recall}{precision + recall}$ (7)

$Kappa = 2 \times \frac{(T_{3} - F_{4})}{(T_{1} + T_{2} + 2 \times T_{3} + F_{1} + F_{2} + F_{3})}$ (8) where TP is the true positive, TN is the true negative, FP is the false positive, and FN is the false negative number, T₁ = TP × FN, T₂ = TP × FP, T₃ = TP × TN, F₁ = FN × TN, F₂ = FP × FP, F₃ = FP × TP, and F₄ = FN × FP.

We also computed the Area Under the Receiver Operating Characteristic Curve (AUC) as a metric for assessing diagnostic accuracy.

4.3 Training setup

We conduct training for our models using a synchronous training configuration on four Tesla M10 GPUs. We employ the PyTorch framework alongside the standard TensorFlow, incorporating a momentum of 0.9. The initial learning rate is set at 0.01, coupled with a batch size of 32 (equivalent to 8 images per chip). A learning rate decay rate of 0.001 is applied every three epochs. To enhance regularization, we integrate a dropout rate of 0.8 and an L2 weight decay of 1e-5. Our image preprocessing protocol aligns with the approach detailed in the backbone framework. Additionally, we integrate exponential moving averaging with a decay factor of 0.9999. Notably, all convolutional layers within our architecture incorporate batch-normalization layers, with an average decay rate of 0.99.

4.4 Experiment setup

We design our extensive empirical study to answer the following five key research questions (RQs):

RQ1: How much does MnLeaf improve deep learning performance compared to classical deep learning methods?

RQ2: How does each scenario in MnLeaf contribute to correct deep learning?

RQ3: How do the key parameters affect the performance of MnLeaf?

In RQ2, we carried out a total of six distinct scenarios, which are outlined in Table 4. For RQ1, we showcase the experiments conducted on the three foundational network baselines. Furthermore, for both RQ3 and RQ4, we employed the two synthetic datasets to assess the performance of the models in the presence of noise. The results will be averaged over experimental runs on two datasets.

Table 4
Six scenarios with different networks

Scenarios ResNeXt50 ViT EfficientNetB5 MobilenetV3

1 Yes Yes No No

2 Yes No Yes No

3 Yes No No Yes

4 Yes Yes Yes No

5 Yes Yes No Yes

6 Yes Yes Yes Yes

4.5 Results and discussion

4.5.1 Comparison With Four Baselines (RQ1)

We conducted a comparative analysis between the outcomes achieved by our proposed method and a recently published approach using Cassava Leaf Disease dataset. The results, as demonstrated in Table 5, highlight the comparison based on metrics such as AUC, Kappa, accuracy, F1 Score, recall (specificity), and precision (sensitivity). Remarkably, our method surpasses all of these previous works in terms of accuracy, F1 Score, recall and precision. This also demonstrates that using a greater number of features will lead to higher accuracy in detecting diseased cassava leaves. Additionally, because the characteristics of various types of diseased cassava leaves are quite similar, the accuracy is still not very high (below 90%).

Fig. 4

Training and loss progress with TLU-Leaf dataset.

Table 5

Performance comparison with Cassava Leaf Disease dataset

Method	Accuracy	Precision	Recall	F1 Score	AUC	Kappa
ResNeXt50	83%	85%	83%	83%	85%	81%
ViT	82%	83%	82%	82%	84%	83%
EfficientNetB5	84%	84%	84%	84%	86%	84%
MobilenetV3	84%	85%	84%	84%	86%	83%
Our method	87%	88%	87%	87%	88%	86%

We conducted model comparisons by ranking the overall accuracy results, as presented in Table 6. The results are reported in percentages. For each model, the authors ran experiments on various datasets, such as the PlantVillage Dataset and Paddy Leaf Dataset, among others. We observe that our proposed MnLeaf model achieves relatively high overall accuracy. It outperforms models like DenseNet 121 [53], CNN-based architecture (modified LeNet) [58], Modified U-net segmentation [59], ResNet152, and InceptionV3 [60] on the PlantVillage dataset. Additionally, MnLeaf’s performance surpasses that of the baseline EfficientNet models based on their respective rankings. In general, our deep learning model achieves a reasonably high level of accuracy, but it does have a weakness in terms of model size, which is quite large (156M params).

Table 6

Performance comparison with others study

Method	Dataset	Params	Accuracy
EfficientNet [52]	PlantVillage	66M	99.91%
DenseNet 121 [53]	PlantVillage + iBean leaf
	+ Citrus leaf + Rice leaf	7M	99.20%
MobileNet-v2 [53]	PlantVillage + iBean leaf
	+ Citrus leaf + Rice leaf	75M	99.03%
DenseNet 201 [53]	PlantVillage + iBean leaf
	+ Citrus leaf + Rice leaf	19M	99.25%
Few-shot learning [54]	PlantVillage	-	94%
DNN_JOA [55]	Paddy leaf	-	98.9%
SoyNet [56]	PDDB	-	98.14%
CNN-based architecture (modified LeNet) [58]	PlantVillage	-	97.89%
Optimized dense CNN architecture (DenseNet) [62]	Corn Leaf Dataset	77M	98.06%
Modified U-net segmentation [59]	PlantVillage	-	99.12%
ResNet152 and InceptionV3 [60]	PlantVillage	16M	98.56%
Based on MoblieNet [61]	NaCRRI	-	92%
EfficientNetB5	TLU-Leaf	28M	84.38%
EfficientNetB5	PlantVillage	28M	99.18%
MnLeaf (our model)	PlantVillage	156M	99.62%
MnLeaf (our model)	TLU-Leaf	156M	90.05%

4.5.2 Applicability to Scenarios (RQ2)

The outcomes of six scenarios with TLU-Leaf dataset are presented in Table 7. The table illustrates four models capable of detecting leaf diseases by utilizing input data and the corresponding network. Among the obtained results, the sixth experiment demonstrates the most favorable performance, achieving an accuracy of 90% and an F1 Score of 90%. This elevated performance can be attributed to the amalgamation of diverse methodologies and a well-suited neural network. As a result of effectively synthesizing multiple input features through neural networks to learn requisite characteristics, the AUC, Kappa, precision, recall, and F1 Score all surpass 90%, with marginal disparities from the accuracy benchmark. With our data still containing a considerable amount of noise, such as variations in vibration and blurriness due to different devices, achieving an AUC of 91% on the TLU-Leaf dataset is relatively good. Therefore, our future efforts will focus on improving data quality and enhancing the recognition capabilities of our models for various types of diseased leaves.

Table 7
Scenario results with TLU-Leaf Dataset

Scenarios Accuracy Precision Recall F1 Score AUC Kappa

1 86% 86% 87% 86% 89% 85%

2 86% 86% 86% 86% 90% 85%

3 87% 87% 88% 87% 91% 86%

4 88% 88% 88% 88% 90% 86%

5 89% 88% 89% 88% 90% 86%

6 90% 91% 90% 90% 91% 87%

4.5.3 Sensitivity Analysis (RQ3)

Due to the utilization of early stopping technique, the training process was halted after 48 epochs. Figure 4 depicts the model training progression over time, showcasing accuracy and loss trends for the sixth scenario with the TLU-Leaf dataset. Both training and validation accuracies exhibit an upward trend, while training and evaluation losses decrease as the number of training iterations increases. The close proximity of the curves indicates the absence of overfitting.

The mean ROC curves for the sixth model are depicted in Figure 5. The figure illustrates that our proposed multiple-model architecture has attained a substantial AUC of 91%, signifying strong performance in discriminating between negative and positive samples.

Fig. 5

Mean ROC curves for the classifiers on the test set.

Table 8 presents a detailed disease classification across various leaf conditions in the sixth scenario. The model performs exceedingly well, with the precision, recall, and F1 Score for the macro average reaching 81%, 79%, and 79% respectively. The micro average and weighted average stand at approximately 91%. Notably, we observed a 91% accuracy with our model without excessive tuning. Support reaches 166 for disease-afflicted leaf classes (bamboo leaf disease, verticillium wilt, leaf scorch, turning yellow, plant wilting) and 830 for each accuracy, micro average, and weighted average.

Table 8

Leaf Disease classification with TLU-Leaf dataset

	precision	recall	f1-score	support
Bamboo leaf disease	0.71	0.59	0.64	166
Leaf scorch	0.84	0.80	0.82	166
Verticillium wilt	0.88	0.76	0.82	166
Turning yellow	0.90	0.89	0.90	166
Plant wilting	0.71	0.90	0.80	166

accuracy			0.91	830
macro avg	0.81	0.79	0.79	830
weighted avg	0.91	0.90	0.90	830

5 Conclusion

We have presented a complex multiple-model neural architecture for the detection of leaf diseases. The MnLeaf framework can detect diseased leaves that are masked by four main models: ResNeXt50, ViT, EfficientNetB5 and MobilenetV3. With the initial high-quality images, we convert to lower-resolution images, noisy images, and segmented images. These four types of images are used to train the four models mentioned above, respectively. The accuracy of the proposed architecture is 99.62% when using the PlantVillage dataset for training and testing, which outperforms other studies using only one model (Table 6). To evaluate the effectiveness of the proposed architecture, we collected a dataset of 2,500 images captured by a phone. Tests with several modern models including ResNeXt50, ViT, EfficientNetB5, MobilenetV3, and multi-networks rules show that our proposed method can reach a 90% F1-score, better than other methods. These results show the promise of the proposed method.

The limitations of current methods for plant disease detection can be summarized as follows. Each of these challenges is listed by the authors: (1) larger sized model; (2) limited data availability; (3) real-world image usage; (4) more accurate disease classification; and (5) disease stage identification. In ongoing work, we focused on the detection of diseases at different positions on plants and various disease stages. The developed model can be integrated as part of an IoT-based system that finds widespread use, enabling early disease detection on leaves from remote distances.

Footnotes

Acknowledgement

This work was supported by Thuyloi University, Vietnam.

References

Saberi-Movahed

et al. Decoding clinical biomarker space of COVID-19: Exploring matrix factorization-based feature selection methods, Comput. Biol. Med. 146 (2022), p. 105426. doi: 10.1016/j.compbiomed.2022.105426.

Chen.

C.-J

, Huang

Y. -Y.

, Li

Y. -S.

, Chang

C. -Y.

and Huang

Y. -M.

, An AIoT Based Smart Agricultural System for Pests Detection, in IEEE Access 8 (2020), pp. 180750–180761. doi: 10.1109/ACCESS.2020.3024891.

, Zhang

and Wang

, Plant Disease Detection and Classification by Deep Learning-A Review, in IEEE Access 9 (2021), pp. 56683–56698. doi: 10.1109/ACCESS.2021.3069646.

Cavigelli

, Hager

, Benini

CAS-CNN: A deep convolutional neural network for image compression artifact suppression, 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA (2017), pp. 752–759, doi: 10.1109/IJCNN.2017.7965927.

Sun

, Tang

, Huang

, Wei.

J.-m

, Liu

A Deep Neural Network-Based Co-Coding Method to Predict Drug-Protein Interactions by Analyzing the Feature Consistency Between Drugs and Proteins, in IEEE/ACM Transactions on Computational Biology and Bioinformatics 20(3) (2023), 2200–22091 May-June 2023, doi: 10.1109/TCBB.2023.3237863.

Sathya

and Rajalakshmi

, RDA-CNN: Enhanced Super Resolution Method for Rice Plant Disease Classification, Comput. Syst. Sci. Eng. 42(1) (2022), pp. 33–47. doi: 10.32604/CSSE.2022.022206.

Bansal

, Sharma

, Jain

A.K.

, Kukreja

Detecting Severity Levels of Cucumber Leaf Spot Disease using ResNext Deep Learning Model: A Digital Image Analysis Approach, 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, (2023), pp. 1–6, doi: 10.1109/INCET57972.2023.10170539.

Dargan

, Kumar

, Ayyagari

M.R.

et al. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning, Arch Computat Methods Eng 27 (2020), 1071–1092. https://doi.org/10.1007/s11831-019-09344-w

and Tang

, Collective intelligence for deep learning: A survey of recent developments, Collective Intelligence 1(1) (2022), https://doi.org/10.1177/26339137221114874

10.

Borisov

, Leemann

, Sesler

, Haug

, Pawelczyk

, Kasneci

Deep Neural Networks and Tabular Data: A Survey, in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2022.3229161.

11.

Zhu

, Liao

, Hu

, Mei

and Li

, MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery, in IEEE Transactions on Geoscience and Remote Sensing 59(7) (2021), pp. 6169–6181. doi: 10.1109/TGRS.2020.3026051.

12.

Wang

, Yang

, Xie

, Yuan

003GENERALIZSTION003Kervolutional Neural Networks, no. i, pp. 31–40, 2019, [Online]. Available: http://arxiv.org/abs/1904.03955

13.

Meguellati

M.E.

, Mahmud

R.B.

, Abdul Kareem

S.B.

, Zeghina

A.O.

and Saadi

, Feature Selection for Location Metonymy Using Augmented Bag-of-Words, in, IEEE Access 10 (2022), pp. 81777–81786. doi: 10.1109/ACCESS.2022.3195220.

14.

Singh

and Misra

A.K.

, Detection of plant leaf diseases using image segmentation and soft computing techniques, Inf. Process. Agric. 4(1) (2017), pp. 41–49. doi: 10.1016/j.inpa.2016.10.005.

15.

Ghahremani

, Liu

and Tiddeman

, FFD: Fast Feature Detector, in, IEEE Transactions on Image Processing 30 (2021), pp. 1153–1168. doi: 10.1109/TIP.2020.3042057.

16.

Khadangi

DeepFlorist: Rethinking Deep Neural Networks and Ensemble Learning as A Meta-Classifier For Object Classification, 2023, [Online]. Available: http://arxiv.org/abs/2307.01806

17.

de Luna

R.G.

, Dadios

E.P.

, Bandala

A.A.

Automated Image Capturing System for Deep Learning-based Tomato Plant Leaf Disease Detection and Recognition, TENCON 2018 -2018 IEEE Region 10 Conference, Jeju, Korea (South), (2018), pp. 1414–1419, doi: 10.1109/TENCON.2018.8650088.

18.

Cap

Q.H.

, Tani

, Uga

, Kagiwada

, Iyatomi

Super-Resolution for Practical Automated Plant Disease Diagnosis System, 2019 53rd Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, (2019), pp. 1–6, doi: 10.1109/CISS.2019.8692855.

19.

Ferdouse

A Novel Approach for Tomato Diseases Classification Based on Deep Convolutional Neural Networks. In: Uddin, M., Bansal, J. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. (2020). https://doi.org/10.1007/978-981-13-7564-4_49

20.

N M., Gowda

K.J.

Image Processing System based Identification and Classification of Leaf Disease: A Case Study on Paddy Leaf, 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, (2020), pp. 451–457, doi: 10.1109/ICESC48915.2020.9155607.

21.

Song

, Yang

, Xu

, King

Graph-Based Semi-Supervised Learning: A Comprehensive Review, in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2022.3155478.

22.

, Qiu

, Li

, Yan

, Lau

R.W.H.

Guided Collaborative Training for Pixel-Wise Semi-Supervised Learning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision - ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. (2020). https://doi.org/10.1007/978-3-030-58601-0_26

23.

Song

, Cao

, Qiao

, Wang

, Yang

J.-J

An Improved Semi-Supervised Learning Method on Cataract Fundus Image Classification, 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, (2019), pp. 362–367, doi: 10.1109/COMPSAC.2019.10233.

24.

Liu

, Yang

, Dou

, Heng

P.A.

, de Bruijne

Federated Semisupervised Medical Image Classification via Interclient Relation Matching. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention - MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12903. Springer, Cham. (2021). https://doi.org/10.1007/978-3-030-87199-4_31

25.

Chen

, Zhu

, Li

and Gong

, Semi-Supervised Learning under Class Distribution Mismatch, Proceedings of the AAAI Conference on Artificial Intelligence 34(04) (2020), 3569–3576. https://doi.org/10.1609/aaai.v34i04.5763

26.

Arazo

, Ortego

, Albert

, O’

N.E.

Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning, 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, (2020), pp. 1–8, doi: 10.1109/IJCNN48605.2020.9207304.

27.

Yuan

, Shou

, Pei

, Lin

, Gong

, Fu

and Jiang

, Reinforced Multi-Teacher Selection for Knowledge Distillation, Proceedings of the AAAI Conference on Artificial Intelligence 35(16) (2021), 14284–14291. https://doi.org/10.1609/aaai.v35i16.17680

28.

Xie

, Luong

M.T.

, Hovy

, Le

Q.V.

Self-training with noisy student improves imagenet classification, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., (2020), pp. 10684–10695, doi: 10.1109/CVPR42600.2020.01070.

29.

Sohn

et al. NeurIPS-2020-fixmatch-simplifyingsemisupervised-learning-with-consistency-andconfidence-Paper, no. NeurIPS, 2020.

30.

Honda

et al. Incremental Teacher Model with Mixed Augmentations and Scheduled Pseudo-label Loss for Handwritten Text Recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition –ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14190. Springer, Cham. (2023). https://doi.org/10.1007/978-3-031-41685-9_18

31.

Qiu

, Cheng

, Gao

, Xiong

, Ren

Federated Semi-Supervised Learning for Medical Image Segmentation via Pseudo-Label Denoising, in IEEE Journal of Biomedical and Health Informatics, doi: 10.1109/JBHI.2023.3274498.

32.

Luo

, Chen

, Wu

, Jiang.

Y.-G

Self-Supervised Learning for Semi-Supervised Temporal Language Grounding, in IEEE Transactions on Multimedia, doi: 10.1109/TMM.2022.3228167.

33.

Y.-J.

, Park

, O’Toole

, Kitani

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), pp. 918–927.

34.

Jaisakthi

S.M.

, Mirunalini

, Thenmozhi

, Vatsala Grape Leaf Disease Identification using Machine Learning Techniques, 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Chennai, India, (2019), pp. 1–6, doi: 10.1109/ICCIDS.2019.8862084.

35.

Seetharaman

and Mahendran

, Leaf Disease Detection in Banana Plant using Gabor Extraction and Region-Based Convolution Neural Network (RCNN), J. Inst. Eng. India Ser. A 103 (2022), 501–507. https://doi.org/10.1007/s40030-022-00628-2

36.

Devaraj

, Rathan

, Jaahnavi

, Indira

Identification of Plant Disease using Image Processing Technique, 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, (2019), 0749–0753, doi: 10.1109/ICCSP.2019.8698056.

37.

Kumari

C.U.

, Jeevan Prasad

, Mounika

Leaf Disease Detection: Feature Extraction with K-means clustering and Classification with ANN, 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, (2019), pp. 1095–1098, doi: 10.1109/ICCMC.2019.8819750.

38.

Liu

, Lv

, Di

Identification of Sunflower Leaf Diseases Based on Random Forest Algorithm, 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS), Chongqing, China, (2019), pp. 459–463, doi: 10.1109/ICICAS48597.2019.00102.

39.

Patil

B.M.

and Amarapur

, Cotton leaf image segmentation using modified factorization-based active contour method, Int. J. Adv. Comput. Sci. Appl. 11(9) (2020), pp. 516–521. doi: 10.14569/IJACSA.2020.0110962.

40.

Akhilesh

S.D.M.

, Kumar

S.A.

R.M.G. and P.C., Image based Plant Disease Detection in Pomegranate Plant for Bacterial Blight, 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, (2019), pp. 0645–0649, doi: 10.1109/ICCSP.2019.8698007.

41.

Zha

, Yuan

, Wen

, Zhang

, Zhou

and Zhu

, Image Restoration Using Joint Patch-Group-Based Sparse Representation, in IEEE Transactions on Image Processing 29 (2020), pp. 7735–7750. doi: 10.1109/TIP.2020.3005515.

42.

Ngugi

L.C.

, Abelwahab

and Abo-Zahhad

, Recent advances in image processing techniques for automated leaf pest and disease recognition, A review, Inf. Process. Agric. 8(1) (2021), pp. 27–51. doi: 10.1016/j.inpa.2020.04.004.

43.

Russel

N.S.

and Selvaraj

, Leaf species and disease classification using multiscale parallel deep CNN architecture, Neural Comput & Applic 34 (1923), 19217–19237. https://doi.org/10.1007/s00521-022-07521-w

44.

Matin

, Khatun

, Moazzam

and Uddin

, An Efficient Disease Detection Technique of Rice Leaf Using AlexNet, Journal of Computer and Communications 8 (2020), 49–57. doi: 10.4236/jcc.2020.812005.

45.

Paymode

A.S.

and Malode

V.B.

, Transfer Learning for Multi-Crop Leaf Disease Image Classification using Convolutional Neural Network VGG, Artif. Intell. Agric. 6 (2022), 23–33. doi: 10.1016/j.aiia.2021.12.002.

46.

Ramkumar

M.O.

, Catharin

S.S.

, Ramachandran

and Sakthikumar

, Cercospora identification in spinach leaves through resnet-50 based image processing, J. Phys. Conf. Ser. 1717(1) 2021, doi: 10.1088/1742-6596/1717/1/012046.

47.

Mienye

I.D.

and Sun

, A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects, IEEE Access 10 2020. doi: 10.1109/ACCESS.2022.3207287.

48.

Mohammed

and Kora

, A comprehensive review on ensemble deep learning: Opportunities and challenges, Journal of King Saud University - Computer and Information Sciences 35(2) (2023), 757–774. https://doi.org/10.1016/j.jksuci.2023.01.014

49.

Ganaie

M.A.

, Hu

, Malik

A.K.

, Tanveer

and Suganthan

P.N.

, Ensemble deep learning: A review, Journal = Engineering Applications of Artificial Intelligence 115 (2022), 105151. https://doi.org/10.1016/j.engappai.2022.105151

50.

Yang

, Lv

and Chen

, A Survey on ensemble learning under the era of deep learning, Artificial Intelligence Review 56(6) (2022), pp. 5545–5589. doi: 10.1007/s10462-022-10283-5.

51.

Mao

et al. Towards Robust Vision Transformer, pp. 2–1, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 2022 (2022), pp. 12032–12041. doi: 10.1109/CVPR52688.2022.01173.

52.

Atila

, Ucar

, Akyol

and Ucar

, Plant leaf disease classification using EfficientNet deep learning model, Ecol. Inform. 61 (2021), 101182. doi: 10.1016/j.ecoinf.2020.101182.

53.

Tiwari

, Joshi

R.C.

and Dutta

M.K.

, Dense convolutional neural networks based multiclass plant disease detection and classification using leaf images, Ecol. Inform. 63 (2021), p. 101289. doi: 10.1016/j.ecoinf.2021.101289.

54.

Argueso

et al. Few-Shot Learning approach for plant disease classification using images taken in the field, Comput. Electron. Agric. 175 2020, doi: 10.1016/j.compag.2020.105542.

55.

Ramesh

and Vydeki

, Recognition and classification of paddy leaf diseases using Optimized Deep Neural network with Jaya algorithm, Inf. Process. Agric. 7(2) (2020), pp. 249–260. doi: 10.1016/j.inpa.2019.09.002.

56.

Karlekar

and Seal

, SoyNet: Soybean leaf diseases classification, Comput. Electron. Agric. 172 2020, doi: 10.1016/j.compag.2020.105342.

57.

Ozguven

M.M.

and Adem

, Automatic detection and classification of leaf spot disease in sugar beet using deep learning algorithms, Phys. A Stat. Mech. its Appl. 535 (2019), 122537. doi: 10.1016/j.physa.2019.122537.

58.

Ahila Priyadharshini

, Arivazhagan

, Arun

and Mirnalini

, Maize leaf disease classification using deep convolutional neural networks, Neural Comput. Appl. 31(12) (2019), 8887–8895. doi: 10.1007/s00521-019-04228-3.

59.

Shoaib

et al. Deep learning-based segmentation and classification of leaf images for detection of tomato plant disease, Front. Plant Sci. 13 (2022), pp. 1–18. doi: 10.3389/fpls.2022.1031748.

60.

Amin

, Darwish

, Hassanien

A.E.

and Soliman

, Endto-End Deep Learning Model for Corn Leaf Disease Classification, IEEE Access 10 (2022), 31103–31115. doi: 10.1109/ACCESS.2022.3159678.

61.

Elfatimi

, Eryigit

and Elfatimi

, Beans Leaf Diseases Classification Using MobileNet Models, IEEE Access 10 (2022), pp. 9471–9482. doi: 10.1109/ACCESS.2022.3142817.

62.

Waheed

, Goyal

, Gupta

, Khanna

, Hassanien

A.E.

and Pandey

H.M.

, An optimized dense convolutional neural network model for disease recognition and classification in corn leaf, Comput. Electron. Agric. 175 (2020), p. 105456. doi: 10.1016/j.compag.2020.105456.

63.

Yin

, Mallya

, Vahdat

, Alvarez

J.M.

, Kautz

, Molchanov

See through gradients: Image batch recovery via gradinversion, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. (2021), pp. 16332–16341, 2021, doi: 10.1109/CVPR46437.2021.01607.

64.

Kirillov

et al. Segment Anything | Meta AI Research, 2023, [Online]. Available: https://ai.facebook.com/research/publications/segmentanything/

Step	Processing
	Input Image x; Predict Label y′
1	Calculate m₁ = ResNeXt50 (x)
2	Calculate m₂ = ViT (x)
3	Calculate m_RV = mean (m₁, m₂)
4	Calculate m₃ = EfficientNetB5 (x)
5	Calculate m_seg = InputSegmentation (x)
6	Calculate m₄ = MobilenetV3 (m_seg)
7	Calculate y′ = sum (m_RV, m₃, m₄)

stage	output	ResNeXt-50
conv1	256 x 256	7 x 7, 64, stride 2
sconv2		3 x 3 max pool, stride 2
	112 x 112	$[\begin{matrix} 1 \times 1, & 128 \\ 3 \times 3, & 128, & C = 32 \\ 1 \times 1, & 256 \end{matrix}] \times 3$
conv3	56 x 56	$[\begin{matrix} 1 \times 1, & 256 \\ 3 \times 3, & 256, & C = 32 \\ 1 \times 1, & 512 \end{matrix}] \times 4$
conv4	28 x 28	$[\begin{matrix} 1 \times 1, & 512 \\ 3 \times 3, & 512, & C = 32 \\ 1 \times 1, & 1024 \end{matrix}] \times 6$
conv5	14 x 14	$[\begin{matrix} 1 \times 1, & 1024 \\ 3 \times 3, & 1024, & C = 32 \\ 1 \times 1, & 2048 \end{matrix}] \times 3$
	1x1	global average pool 128 fc
	# params.	25.0 x 10⁶

	input size	structure	ViT
Input	224 x 224	stem	conv, 1 x 1, 64, max, S2
		stage	conv, 1 ×1, 64conv, 3 ×3, 64, S2conv, 1 ×1, 256 ] × 3
		transition	conv, 1 x 1, p
backbone	197 x 1		[linear project, p/h similarity matrix, softmax linear combination fully conntected, p′ fully conntected, p ] × L
	1 x 1	class token	full connected, 128

Scenarios	ResNeXt50	ViT	EfficientNetB5	MobilenetV3
1	Yes	Yes	No	No
2	Yes	No	Yes	No
3	Yes	No	No	Yes
4	Yes	Yes	Yes	No
5	Yes	Yes	No	Yes
6	Yes	Yes	Yes	Yes

Leaf disease classification with Multiple-model deep learning

Abstract

Keywords

1 Introduction

2 Related works

3.1 MnLeaf framework

3.3 ResNeXt50

3.6 MobilenetV3

3.7 Fusion

4.1 Dataset

4.4 Experiment setup

Table 4 Six scenarios with different networks Scenarios ResNeXt50 ViT EfficientNetB5 MobilenetV3 1 Yes Yes No No 2 Yes No Yes No 3 Yes No No Yes 4 Yes Yes Yes No 5 Yes Yes No Yes 6 Yes Yes Yes Yes

4.5.1 Comparison With Four Baselines (RQ1)

Table 7 Scenario results with TLU-Leaf Dataset Scenarios Accuracy Precision Recall F1 Score AUC Kappa 1 86% 86% 87% 86% 89% 85% 2 86% 86% 86% 86% 90% 85% 3 87% 87% 88% 87% 91% 86% 4 88% 88% 88% 88% 90% 86% 5 89% 88% 89% 88% 90% 86% 6 90% 91% 90% 90% 91% 87%

Footnotes

Acknowledgement

References

Table 4
Six scenarios with different networks

Scenarios ResNeXt50 ViT EfficientNetB5 MobilenetV3

1 Yes Yes No No

2 Yes No Yes No

3 Yes No No Yes

4 Yes Yes Yes No

5 Yes Yes No Yes

6 Yes Yes Yes Yes

Table 7
Scenario results with TLU-Leaf Dataset

Scenarios Accuracy Precision Recall F1 Score AUC Kappa

1 86% 86% 87% 86% 89% 85%

2 86% 86% 86% 86% 90% 85%

3 87% 87% 88% 87% 91% 86%

4 88% 88% 88% 88% 90% 86%

5 89% 88% 89% 88% 90% 86%

6 90% 91% 90% 90% 91% 87%