Abstract
Soil moisture content (SMC) is an important parameter that affects tea growth. Reasonable soil moisture content improves tea quality and ensures tea yield. Therefore, it is necessary to regularly monitor the soil water content. However, the traditional soil moisture content prediction algorithm has the problems of low accuracy and low efficiency. This paper constructs and evaluates the performance of a hybrid arithmetic optimization algorithm (AOA) and support vector machine (SVM) prediction model (AOA-SVM) for predicting SMC in tea plantations. Grey relation analysis (GRA) and Pearson correlation analysis are adopted to select features for soil moisture content prediction model, and then the correlation of SMC with soil temperature (ST), atmospheric temperature (AT), and soil electrical conductivity (SEC) was analyzed. The optimal penalty parameter (
Keywords
Introduction
Moisture is an indispensable element in the growth and development of tea. Soil moisture changes in tea plantations will directly affect the fertility of tea trees and the yield and quality of tea leaves. Most of the tea planting areas are in mountainous and hilly areas, where the soil layer is thin, with poor water retention and strong permeability, often causing seasonal drought and water shortage in the tea plantations. Therefore, it is especially important to take an effective method to regularly monitor the soil moisture content (SMC) of tea plantations and construct a reasonable SMC prediction model, which is important for formulating and guiding the irrigation and fertilization management of tea plantations. The level of prediction of soil moisture content influences the level of water-saving irrigation and crop yield. 1
Research on SMC prediction models has also been given attention. There are four principal methods for the prediction of soil water content, they are soil water balance prediction model, 2 soil hydrodynamic prediction model, 3 empirical prediction model, 4 and neural network prediction model. 5 The relationship between soil water content and storage variation using an empirical model is analyzed. 6 Blanchard et al. 7 determined the maximum and minimum values of soil temperature in different seasons and derived an expression for soil moisture as a function of season. Jackson et al. 8 proposed a crop water deficit index and analyzed its correlation with soil water content. Robinson and Hubbard 9 constructed a soil moisture prediction model for upland soils. The soil moisture index from earth resource satellite (ERS) data is calculated and this is used to predict the moisture of land profiles. 10 Factors affecting soil moisture are analyzed and a prediction model based on the water balance principle is also constructed. 11 The accuracy of moisture prediction can also be ensured by limiting the number of samples with guaranteed accuracy.12,13 The principle of water balance was utilized to analyze the model between the evaporation, soil water holding capacity, etc. with soil moisture content based on. 14 A prediction model for vineyard multiple regression analysis was constructed. 15 Study the soil moisture monitoring method by hyper-spectral remote sensing technology is proposed and five regression methods are employed for inversion. 16 A soil moisture retrieval model is designed to predict SMC for different soil types using hyper-spectral data from 470 to 2400 nm. 17 Hyper-spectral data are used for deep learning to determine the soil moisture content of a partially vegetated cover surface. 18
The soil water content at 5 cm depths, effective rainfall, and evaporation were used as input to the multiple regression models to predict the average soil water content in the root zone. 19 Meandering functions (PTFs) for soil moisture prediction were constructed using multiple linear regression (MLR), artificial neural networks (ANNs), and Rosetta models. 20 Support Vector Regression (SVR) optimized by chaotic methods was used for short-term soil moisture content prediction with the aim of coping with agricultural droughts. 21 Random forest (RF), cubic regression (CR), and gradient lifters (GBM) were used to predict water management parameters for irrigation management in a fully remote manner. 22 A quantitative estimation model for soil water content was developed using the relationship between the ratio spectral index (RSI), the difference spectral index (DI), and the normal spectral index. 23 A fractional differential generalization of the one-dimensional Richards equation was used to predict soil water content and the model was calibrated using a particle swarm algorithm. 24 Parameters such as temperature and humidity were selected to predict soil water content using a gray neural network. 25 A nonlinear model for soil water content prediction with fewer training factors is constructed with an extreme learning machine (ELM). 26 Time series, 27 back propagation (BP) neural networks and long-short term memory (LSTM) deep learning, 28 and support vector machine (SVM) models29,30 have also been applied by researchers for soil water content model prediction. There are several state-of-the-art meta-heuristic algorithms, for instance, beetle antennae search (BAS),31,33 bald eagle search (BES),34,36 widely used for complex systems and they are immune to minima. The α-β-Divergence-Generalized Recommender is also a good method to achieve highly accurate prediction. 37
The above-mentioned research methods are subject to some problems: the applications are limited, the models are influenced by experience, and BP neural networks are slow to converge. LSTM networks require high data and network structure, and SVM models are susceptible to parameter taking.
The objectives of this article are to (1) provide the theoretical basis for feature selection for soil moisture content prediction model by combining Grey relation analysis (GRA) and Pearson correlation analysis, and investigate the correlation between SMC and ST, SEC and AT; (2) propose soil moisture content prediction model based on SVM model optimized by AOA, which gives an idea for soil water content monitoring.
Section “Introduction” introduces the role of soil water content in crop growth, and introduces the research progress and methods of soil moisture content prediction models; section “Materials and method” describes the research methodology as well as the data acquisition methods, and constructs the prediction models based on support vector machine optimized arithmetic optimization algorithm; section “Experimental results and analysis” is the experimental results and analysis; finally, conclusions and discussion are presented.
Materials and method
Experimental data
The subject of this article is the tea plantation of Guangxi State-owned Fuhu Overseas Chinese Farm (109°21'N, 24°82′E), with a planting area of roughly 58.4 ha. MEC20 sensors developed by Dalian Zheqin Technology Co. are used for collecting SMC, ST, and SEC of tea plantations, which is widely used in precision irrigation.
The ST, SMC, and SEC of the tea plantation studied in this article were gathered from 10 August 2020 to 27 March 2021. The collected data are first counted by day, and the outliers and missing values of the data are handled, and finally, a total of 216 sets of data samples are obtained as the training set and test set of the model data.
Arithmetic optimization algorithm (AOA)
The AOA is a novel algorithm proposed by Abualigah et al. 38 in 2021 to improve the accuracy of local regions and global dispersion of position updates using addition, subtraction, multiplication, and division operations, which is promising for applications. This algorithm has an initialization phase, an exploration phase, and a development phase.
In the initialization phase, Math Optimizer Accelerated (MOA) is designed to control the search phase of the algorithm, which is defined as equation (1)
A random number
In the exploration phase, multiplication and division search strategies are used to perform a global search in the search area to find the optimal solution. Multiplication and division search strategies depend on a random number
MOP is defined as equation (3)
And in the development phase, additive and subtractive search strategies are used to perform local searches on the search area, which are modeled in equation (4)
Support vector machine (SVM)
The basic idea of the SVM is to solve the line or hyperplane that can correctly divide the training data set and have the largest geometric interval. 39
Suppose the input sample, which is defined as equation (5)
So the optimization problem can be expressed in equations (7) and (8)
The pairwise problem of the above problem is shown in equation (9)
The decision function of SVR is defined as equation (10)
SVR model performance is affected by the kernel function and penalty factor, which are needed to be optimized.
The loss function is used to measure how good a predictor is in classifying and predicting the input data. Hinge loss function commonly used in the SVM algorithm, which is given in equation (11)
Therefore, the loss function of SVM is a combination of the hinge loss function and regularization loss, which is given in equation (12)
SVM model optimized by AOA
The steps of the algorithm are as follows:
(1) Initialize the relevant parameters in AOA and SVM; (2) The fitness value of AOA is calculated with MSE of the predicted value of SVM, which is defined as equation (13) (3) Calculate the AOA value to select a strategy and update the position; (4) Find the optimal position value; (5) Apply optimized values for SVM network prediction.
The detailed flowchart of the AOA-SVM algorithm is presented in Figure 1.

Flowchart of the arithmetic optimization algorithm and support vector machine (AOA-SVM).
Experimental results and analysis
Model evaluation criteria
In order to evaluate the performance of the model, the model evaluation indicators are root mean square error (MSE), decision system(
Model feature
In order to improve the accuracy of model prediction, model features need to be extracted. Principal component analysis (PCA), latent Dirichlet allocation (LDA), Grey relation analysis (GRA), and Pearson correlation analysis are the common methods used for feature extraction. A latent factor analysis-based approach is used for selecting an online sparse streaming feature, which can significantly improve quality. 41
In this article, GRA and Pearson correlation analysis are adopted to determine the feature selection of the model.
GRA can find out the relationship between various parameters and determine the important factors that affect the target value. The degree of relevance is defined as equation (18)
Table 1 shows that SMC has the strongest correlation with the SEC, followed by AT, which is the least affected by ST; but the correlation values of the three slave parameters are all greater than 0.65. So, the three parameters are invoked as the feature input of the model.
Relevance analysis results.
ST: soil temperature; AT: atmospheric temperature; SEC: soil electrical conductivity.
Pearson correlation analysis is used to measure the degree of correlation between two variables. The hypothesis testing part of Pearson's correlation coefficient needs to meet the following conditions: the two sets of experimental data meet the normal distribution and appear in pairs; the correlation gap between the experimental data cannot be too large; the sampling of each set of experimental data is relatively independent. Before performing Pearson correlation analysis, it is necessary to perform a normality hypothesis test on all variable data samples collected.
A




It can be seen from Figures 2 to 5 that most of the points between the actual value and the expected value fall on the trend line, indicating that all the original sample data basically obey the law of normal distribution.
The Pearson correlation coefficient is shown in Table 2.
Correlation analysis results.
ST: soil temperature; AT: atmospheric temperature; SEC: soil electrical conductivity.
Table 2 shows that the Pearson correlation coefficient between SMC and SEC is 0.9487, which is a very strong correlation. It is strongly correlated with ST and is moderately correlated with AT, and both are positive correlations; therefore three parameters are a consideration as the feature input of the model.
Model performance analysis
First, the performance of the AOA-SVM model is confirmed. Scramble the data set during each test, taking the first 150 groups as the training set, and the remaining 66 groups as the test set. In order to truly test the performance of the model, multiple experiments are required, and the average value is considered as the final result. The prediction results of the AOA-SVM model are presented in Table 3.
Test results of the AOA-SVM prediction model.
AOA-SVM: arithmetic optimization algorithm and support vector machine; MSE: mean square error.
Table 3 shows that the determination coefficients of the AOA-SVM prediction model are all above 95%, with an average value of 96.77%, indicating that this model has good prediction performance and strong generalization ability, and can be applied to the prediction of soil moisture content.
Figure 6 shows the test results of the AOA-SVM prediction model. The predicted value using the AOA-SVM model can capture the true value very well and is the closest to the true value, which verifies the feasibility and effectiveness of this model.

Test results graph of arithmetic optimization algorithm and support vector machine (AOA-SVM) prediction model.
The performance of the AOA-SVM model is compared with SSA-SVM, ELM, CNN, SVM, etc. in this section. Five methods are used to predict the soil moisture content to verify the effectiveness of the AOA-SVM model prediction. The specific parameters of the experiment are shown in Table 4.
Experiment parameters.
AOA: arithmetic optimization algorithm; SVM: support vector machine; SSA: sparrow search algorithm; ELM: extreme learning machine; CNN: convolutional neural network; ST: soil temperature.
The prediction results are shown in Table 5.
Test results of different algorithms.
SVM: support vector machine; ELM: extreme learning machine; CNN: convolutional neural network; AOA-SVM: arithmetic optimization algorithm and support vector machine; MSE: mean square error; MAE: mean absolute error; ME: mean error; SSA-SVM: sparrow search algorithm and support vector machine.
From Table 4, the results of AOA-SVM, SSA-SVM, and CNN are better, with a coefficient of determination above 93%; the coefficient of determination of SVM and ELM is 81.69% and 89.61%, respectively. Compared with this optimization model, AOA-SVM has achieved relatively good performance.
Conclusions
This article proposes a hybrid AOA and SVM prediction model (AOA-SVM) to predict the soil moisture content of tea gardens. Experiments show that this prediction model has achieved good prediction results. Analyzed the correlation between soil water content and soil temperature, atmospheric temperature, and soil electrical conductivity, and determined that the correlation between soil water content and soil electrical conductivity was the strongest, followed by atmospheric temperature and the least affected by soil temperature; but due to their The correlation degree and correlation coefficient are relatively high, so soil temperature, atmospheric temperature, and soil conductivity are used as the characteristic input of the model. Then the AOA is used to optimize the kernel function and penalty factor in the SVM model and the optimized AOA-SVM model is applied to the prediction of soil moisture content in tea gardens. Experiments have proved that this model has superior performance and strong generalization ability. The feasibility of applying this model to the prediction of soil moisture content is explored, and the AOA-SVM model is compared with SSA-SVM, ELM, SVM, and CNN. Experimental results show that the AOA-SVM has superior advantages over other models in performance. Therefore, the AOA-SVM model proposed in this paper can improve the prediction accuracy of soil water content, provide reliable guidance for the irrigation and fertilization of tea gardens, and provide scientific management for the precise irrigation and management of tea gardens.
Since soil moisture content is more influenced by other factors in the environment, a further in-depth research is needed to study the relationship between soil moisture content and meteorological parameters and other soil parameters, such as pH and water potential, and integrate them into the model to obtain better predictions.
Since the influence of each parameter in the soil indicator is also mutual and also influenced by more environmental factors, it is necessary to use more complex networks to build models that can better predict and improve the accuracy of prediction.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Provincial Rural Revitalization Strategy Special fund “Big project + Task list” project in 2020 of Yunfu City, Research and Demonstration of Key technologies of intelligent water and fertilizer drip irrigation system in orchard based mon Internet of Things combined big data and Artificial Intelligence (2020020103) and 2018 Guangxi University High-level Innovation Team and Excellence Scholars Program (grant no. Guijaioren [2018] No. 35).
