Abstract
Artificial neural networks (ANNs) have been recognized as powerful tools able to automatically learn complex relationships in data. To the best of our knowledge, ANNs have not yet been applied to forest regeneration modeling. Such models are essential to evaluate the effects of reforestation techniques. To fill this gap, the capacity of ANNs to simulate the initial recruitment of pine species in Mediterranean forests has been evaluated in this study. A feed-forward multilayer neural network has been applied to a case study of pine forests of Castilla La Mancha (Central–Eastern Spain), where seed germination and seedling survival, with or without seed protection, of four pine species (Pinus nigra, Pinus pinaster, Pinus halepensis, and Pinus sylvestris) were observed throughout 10 years under three soil conditions (scalped, wildfire-affected, and unaltered soils). The results we obtained have witnessed the good capacity of ANNs to predict both stages of pine initial recruitment. This may be of help in predicting the success of natural regeneration in Mediterranean pine forests under different tree species, soil characteristics, and management strategies.
Introduction
Data-driven models, such as artificial neural networks (ANNs), have been developed and used worldwide for a large variety of applications (e.g., in social analysis, Bouleanu et al., 2022; in robotized vehicles, Bredereke, 2022; visual classification, Verma et al., 2022; and traffic management, Golubev and Novikova, 2022). In the environmental fields, ANNs have been exploited, for instance, in hydrology (Zema et al., 2020), air pollution (Ayturan et al., 2018), animal behaviors (Whittingham et al., 2006), water resources forecast (Fotia et al., 2022), biomonitoring (Precup et al., 2022), and many others. In such contexts, ANNs have been applied to complex, nonlinear processes using poor and high-level uncertainty data as input. Compared to traditional models, ANNs do not require preliminary information about the processes to be modeled, which are simulated by means of a mathematical approach (Sudheer et al., 2002). Moreover, ANNs provide good accuracy and reliability also when the input dataset is small, incomplete, or noisy, and work in highly dynamic systems (Tahmasebi et al., 2020). This is due to the capacity of ANNs to learn from the analysis of the available data without requiring reprogramming but exploiting self-adaptation and validation processes (Gholami et al., 2018).
With specific regard to agriculture, ANNs have been used as powerful prediction tools for several purposes. In more detail, applications are reported for remote sensing of vegetation (Kattenborn et al., 2021), predictions of microclimate and energy optimization in greenhouses (Escamilla-García et al., 2020), estimations of nutrition level, growth, frost formation, soil moisture, irrigation requirements, and evapotranspiration for several crops (Jha et al., 2019). In forestry management, this important prediction tool has been rarely adopted to model ecological processes in delicate ecosystems. Applications were reported by Gue et al. (2020) to model soil depth and temperature of forest soils, image classification of microorganisms, and classification of dead and surviving trees. Forests play many functions and provide several ecosystem services, including nutrient cycling, organic matter decomposition, and wood production as well as climate (via carbon sequestration) and water regulation (by preventing soil erosion and flooding; Byrnes et al., 2014). Unfortunately, many factors of abiotic and biotic origin can disturb these ecosystem services, and damage the goods supplied by forests (Felipe-Lucia et al., 2018). These factors are, for example, excessive harvesting, insects, diseases, drought, flooding, pollution, and fire (Seidl et al., 2012). In this context, the strategies for forest management should be targeted to the conservation and stability of ecosystem (Bolte et al., 2009). A good understanding of natural regeneration is essential for these targets, since this process plays a fundamental role in stand persistence (Calama Sainz et al., 2017). Improving our understanding of the dynamics of natural regeneration is very important to limit the effects of the disturbance of abiotic and biotic factors and to have a healthy and beneficial forest ecosystem (Calama Sainz et al., 2017).
Natural regeneration is a key ecological process that is difficult to predict, due to the variability of tree species growth and site characteristics (Zamora et al., 2001). This is peculiar to the Mediterranean forests, where fire has determined adaptation mechanisms for plant regeneration (Alcañiz et al., 2020; Lucas-Borja et al., 2021; Pausas, 2004). In addition, drought and limited soil water availability are the key constraints to natural regeneration in these areas (Gómez-Aparicio et al., 2008; Lucas-Borja et al., 2020). Some species of pine, one of the most common tree species of Mediterranean forests, can regenerate using different strategies, also after fire at high severity (Kozlowski, 2002). Characteristics of trees, such as the species and basal area (Lucas-Borja et al., 2020), and site management, namely preparation of soil and protection of seed (Ameztegui and Coll, 2015; Sagra et al., 2017), play important roles in the natural regeneration of pines in the Mediterranean environment (Calama Sainz et al., 2017; Modrỳ et al., 2004). In this context, it is essential to predict the initial recruitment of pine trees in relation to species, management, and site characteristics, since this may increase the success of forest restoration project. However, this difficult task, which is associated with the complexity of the ecological processes as well as to the variability of tree species, management techniques, and site characteristics, requires reliable prediction models with low requirement of input parameters.
Theoretically, the use of ANNs to simulate the natural regeneration of Mediterranean pine forests is potentially sound, thanks to the prediction ability of this tool. However, the ANN application needs training, optimization, and testing steps carried out by using a suitable set of observations. To the best of the authors’ knowledge, ANNs have been never used for predicting the initial recruitment of pine forests, and the promising capability of this tool requires validation in suitable case studies. The considerations above represent, in our opinion, an important motivation to explore the actual capabilities of ANN in the aforementioned application domain. In this perspective, our study adopts, as a novel approach to the problem of predicting the initial recruitment of pine forests, an ANN to model two variables associated with the regeneration (seed germination and plant survival) of four pine species (Pinus nigra, Pinus pinaster, Pinus halepensis, and Pinus sylvestris), under three soil conditions (scalped, postfire, and without any management actions), and with or without seed protection. After an analysis of the effects of these soil conditions and management techniques on tree regeneration, the ANN has been trained, optimized, and tested by using a 10-year observations (period of 2006–2016) dataset of the modeled variables in forest stands of Castilla La Mancha (Central–Eastern Spain). Monitoring was carried out on an annual scale, starting from seeding for each species until emergency (from one to three years from seeding depending on the species). The proposed ANN may be a valid prediction tool to simulate the natural regeneration of Mediterranean forests under different plant species, site characteristics, and management strategies.
The rest of the paper is organized as follows. Section 2 describes materials and methods, Section 3 presents and discusses the results, and in Section 4 some conclusions are proposed.
Materials and Methods
A short and simple description of the ANN stage, dataset, and setting is provided and explained in detail in the following of this section, together with the validation procedure in the case study.
Our research activity can be modeled by the following steps:
Definitions of the ANN architecture. ANN implementation. Data preprocessing. Validation in the case study.
ANN training. ANN test.
The Architecture
A feed-forward multilayer neural network (NN), depicted in Figure 1, has been exploited to simulate the pine initial recruitment. Its architecture consists of a set of nodes (inspired by biological neurons)

An example of multilayer NN architecture. Note. NN = neural network.
More specifically, an NN computes its outputs as follows. Each node
The training algorithm sets the weight
In this work, as activation function (a) the well-known sigmoid function is adopted. More formally:
For the experiments, we exploited the Deeplearning4j (Deeplearning4j Suite Overview, 2024) framework to train the ANN. The implementation of the algorithm is described by Deeplearning4j Suite Overview (2024). The networks were trained on real data consisting of 3,614 tuples storing six attributes, namely (i) forest species, (ii) basal area, (iii) seed protection, (iv) soil condition, (v) seed germination rate, and (vi) plant survival rate.
The first four attributes have been used as the NN inputs, while the latter two are used as outputs.
Data Preprocessing
First, we preprocessed the data to obtain a suitable dataset for the training. In particular, both the values of forest species and soil condition have been transformed into a real value belonging to the domain
Description of the Study Area
The observation of pine initial recruitment was carried out in forests of the Cuenca Mountain Range of the Iberian System (Castilla La Mancha, Central–Eastern Spain; Figure 2). The elevation of the experimental areas ranges from 600 to 1900 m above sea level. P. nigra dominates the composition of this forest, which is naturally distributed between 1000 and 1700 m, also as nearly pure stands. Mixed stands of P. nigra and P. pinaster as well as P. nigra and P. sylvestris and grow at lower (1000–1100 m) and upper elevation (1400–1700 m) limits of the study area. In the extremes of this altitudinal range, P. nigra is isolated or residual populations are evident, and the latter is split into a small stand.

Location of the experimental sites (Castilla La Mancha, Spain).
The main soil types in the region are calcium-rich and mainly shallow calcareous bedrock (Soil Survey Staff, 1999), while the dominant soils of the studied areas are Lithic haploxeroll to Typical xerorthent.
Previously, literature records were analyzed to identify the distribution area of P. halepensis, P. pinaster, P. nigra, and P. sylvestris in Castilla La Mancha region. Moreover, several extensive field surveys in the Cuenca Mountain Range were carried out to select the study areas. The identified areas were three: “Arcas del Villar” (AR, lower altitude range), “Los Palancares” (PA, mid-range), and “Tragacete” (TA, higher range) (2). P. halepensis and P. pinaster were surveyed in AR, P. nigra in PA, while P. sylvestris was monitored in TA. Table 1 reports the main geographical, pedological, and ecological information about the three study areas, of which more details can be found in the work by Rubio-Moraga et al. (2012).
Main Geographical, Soil and Ecological Information About the Study Areas of AR, PA, and TA (Cuenca Mountain Range, Castilla La Mancha, Spain).
Note. AR = Arcas del Villar; PA = Los Palancares; TA=Tragacete.
All these forests are stands that were naturally regenerated. The traditional management used the shelterwood method, with 20–25 years of shelter-phase and 100–125 years of rotation period.
In these sites, three experimental stands, each of about 2 ha, were selected. Two forests were pure stands of P. pinaster (AR) and P. nigra (PA), while the third forest was a mixed stand of P. nigra and P. sylvestris (TA). In each stand, three experimental sites were identified, with lower (10–15 m
Figure 3 illustrates the dataset of observations used for ANN implementation in the case study. Seed germination was the highest for PN in scalped soils with seed protection (on average 83.3

Seed germination and plant survival for four pine species (PN, PS, PP, and PH) regenerating under three soil conditions (S, PF, and C) and two seed management techniques (SP and NSP) in forest stands of Castilla La Mancha (Spain). Note. PN = Pinus nigra; PS = Pinus silvestris; PP = Pinus pinaster; PH = Pinus halepensis; S = scalped; PF = postfire; C = without action; SP = seed protection; NSP = no seed protection.
Higher plant survival was found in PN compared to PP and PH. The protected seeds gave higher plant survival in PH and PN, while the differences between PP with seed protection and all species that survived without seed protection were low. Soil preparation was effective for plant survival in PH and PN growing on scalped soils, while the other soil preparation techniques marginally influenced plant survival for all species. Finally, the application of seed protection and soil preparation was always effective in both seed germination and plant survival for all species.
In this section, we introduce the Java code used to implement our ANN, as shown in Figure 4.

The java code implementing the ANN. Note. ANN = artificial neural network.
Let numInput be the number of the
In training our NN with the dataset above described we set the maximum error to 0.0001, with a learning rate set to 0.2 and momentum set to 0.7. To measure the NN error, the mean percent absolute error (MAPE) has been computed as:
To train and test the proposed NN, the dataset was split into two pairs of subsets (one for emergence and one for survival). For each variable, the two subsets consisted of 2,347 and 1,265 data.
As a first attempt, we trained the NN by using two different activation functions for the input (i.e., 4 nodes) and the output layers (i.e., 2 nodes), the hyperbolic and the sigmoid functions, respectively. After an initial test session, carried out by experimenting with several ANNs, different for both the number of hidden layers and number of neurons for the layer with unsatisfactory results in terms of error with respect to the expected results.
In the second phase, we repeated the same configurations analyzing a single output at a time. We used a Tangent transfer function and a Backpropagation with Momentum for the learning rule, where the momentum is a real value exploited to speed up the learning process and to pick up the algorithm’s efficiency. At the beginning, we trained NN for the first output seed germination rate (Table 2). We inserted a single hidden layer with 20 neurons and we obtained a MAPE of 8%. Step by step, we have increased the number of neurons up to 1,000 obtaining a MAPE of 7.5%. At this point, we decided to increase the number of hidden layers to 2 by varying the number of neurons. We obtained an error of 7.33% by inserting 50 neurons in both levels. Finally, we tried further tests by increasing the number of hidden layers and, also, changing the transfer function but we didn’t get better results. A MAPE of 7.33% is an important result, because the total error has been downplayed Figure 5. After this first study, the best configuration for the first output seed germination rate consisted of two hidden layers with 50 neurons for each layer.

MAPE (%) for the first output of the ANN (seed germination rate of four pine species, Castilla La Mancha, Spain): (a) 20 neurons; (b) 50 neurons, and (c) 100 neurons. Note. MAPE = mean absolute percent error; ANN = artificial neural network.
MAPE for the First Output of the ANN (Seed Germination Rate of Four Pine Species, Castilla La Mancha, Spain).
Note. The minimum value is highlighted in bold. MAPE = mean absolute percent error; ANN = artificial neural network.
MAPE for the First Output of the ANN (Plant Survival Rate of Four Pine Species, Castilla La Mancha, Spain).
Note. the minimum value is highlighted in bold. MAPE=mean absolute percent error; ANN=artificial neural network.
Second, we trained another ANN for the second output plant survival rate (Table 3). Also in this case, we used as activation function an hyperbolic tangent and a Backpropagation with Momentum as a learning rule. For the tests, we repeated the same procedure as in the previous experiments for the first output (Figure 6). First, we inserted a single hidden layer with 20 neurons and we obtained a MAPE of 1.5% (Figure 7).

MAPE (%) for the second output of the ANN (plant survival rate of four pine species, Castilla La Mancha, Spain): (a) 20 neurons; (b) 50 neurons, and (c) 100 neurons. Note. MAPE = mean absolute percent error; ANN=artificial neural network.

The DL4J UI of the ANN for the second output (plant survival rate of four pine species, Castilla La Mancha, Spain). Note. DL4J UI = Deeplearning4J user interface; ANN = artificial neural network.
Step by step, we increased the number of neurons up to 1,000 obtaining a MAPE of 1.47%. By increasing the number of hidden levels and varying the number of neurons, we did not obtain significant improvements in the MAPE value. Next, we inserted three hidden layers and 20 neurons and obtained a MAPE of 1.2%. By increasing the number of neurons on the threehidden layers the MAPE remained almost constant. Finally, we tried further tests by increasing the number of hidden layers and, also, changing the transfer function but we didn’t get better results. After this second study, the best performance architecture for the first input and second input plant survival rate resulted in three hidden layers with 20 neurons for each layer.
Overall, very low differences in the mean values of seed germination and plant survival were achieved between field observations and the corresponding simulations using the experimental ANN. Both for ANN training and testing, these differences were lower than 3.2%, which shows the good capacity of this tool in predicting both stages of pine initial recruitment. Conversely, the maximum values of ANN predictions were affected by a noticeable estimation error in data training (> 30%), which becomes high (over 90%) in the testing phase both for seed emergence and plant survival (Table 4).
Finally, it is worth highlighting that several prediction models are already used in the literature and the simulation of the output variables generally yields a similar prediction capacity as the ANN. Two prerequisites for this satisfactory performance are (i) the availability of proper datasets of input parameters and (ii) the need to validate the predictions in similar environmental conditions as the application conditions. However, the main limitations of these models are: (i) the need for a priori knowledge of the underlying process or assumptions of the targeted function structure; (ii) the lower accuracy of these models compared to ANNs, particularly when the problem is poorly defined; and (iii) the model implementation procedures are slower compared to the use of ANN, when the modeling conditions are very complex (i.e., the natural regeneration processes as in this study). In contrast, the main advantages of ANN are (i) the higher efficiency of the training phase compared to the calibration/validation procedures required by the other models and (ii) the possibility of an easy improvement of the network in time by continuing the training on further available data, once the ANN has been previously trained.
This study has explored the possibility applying ANNs to model the initial recruitment of four pine species in Castilla La Mancha (Central–Eastern Spain) under different management and soil conditions. A feed-forward multilayer NN exploiting a dataset of 3,614 tuples consisting of six attributes, namely (i) forest species, (ii) basal area, (iii) seed protection, (iv) soil condition, (v) seed germination rate, and (vi) plant survival rate, with the latter two representing the ANN outputs, has been used. The best performance architectures consisted of two hidden layers with 50 neurons in each layer for the first output seed germination rate (MAPE of 7.33%), and three hidden layers with 20 neurons in each layer for the second input plant survival rate (MAPE of 1.2%). Both for ANN training and testing, the mean differences in seed emergence and plant survival were between field observations and the corresponding simulations were lower than 3.2%. This shows the good capacity of this tool in predicting both stages of pine initial recruitment and, therefore, these results are encouraging toward broader modeling approaches using ANNs in other environmental fields.
Of course, the proposed model of ANN has been applied and tested in only one case study, and this modeling exercise must be considered as a first attempt of validation toward other applications in other sites with similar conditions and under different environmental conditions, once successfully validated through representative case studies. These positive validations may provide forest managers with a useful tool to predict natural regeneration rates of vegetation in delicate forest ecosystems of semi-arid areas for real-world applications. Ongoing research studies will be addressed to explore the advantages deriving by joining other approaches such as Tan et al. (2020) with ANNs to obtain higher dynamic monitoring about the forest recruitment.
Statistical Indexes and Differences Between Observations of Pine Initial Recruitment and Simulations by ANN (Castilla La Mancha, Spain).
Note. ANN = artificial neural network.
Footnotes
Authors’ Note
A preliminary version of this study was presented at the 15th International Symposium on Intelligent Distributed Computing (IDC 2022), 14–16 September 2022, Bremen, Germany (Fotia et al., 2023).
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
