Abstract
In order to detect fire automatically, a forest fire image recognition method based on convolutional neural networks is proposed in this paper. There are two main types of fire recognition algorithms. One is based on traditional image processing technology and the other is based on convolutional neural network technology. The former is easy to lead in false detection because of blindness and randomness in the stage of feature selection, while for the latter the unprocessed convolutional neural network is applied directly, so that the characteristics learned by the network are not accurate enough, and recognition rate may be affected. In view of these problems, conventional image processing techniques and convolutional neural networks are combined, and an adaptive pooling approach is introduced. The fire flame area can be segmented and the characteristics can be learned by this algorithm ahead. At the same time, the blindness in the traditional feature extraction process is avoided, and the learning of invalid features in the convolutional neural network is also avoided. Experiments show that the convolutional neural network method based on adaptive pooling method has better performance and has higher recognition rate.
Introduction
With the rapid development of digital camera technology and image processing technology, the flame detection method based on computer vision system has gradually replaced the traditional method and has become an important trend.1,2 Due to the complex background and large space of the forest fire image, certain difficulties are brought to the forest fire identification process, especially in the feature selection process, there are often some blind operations. Applying the convolutional neural network (CNN) technology to image recognition can avoid the blindness and randomness to a large extent in the feature extraction process, and theoretically extract deeper features, which can greatly improve the accuracy of flame image recognition. CNN technology has been applied to fire image recognition by many researchers.3,4
Wang 5 used the logistic regression and support vector machine to classify the images in the output classification layer of the CNN for forest fire detection. The experiments show that the accuracy of the forest fire detection algorithm based on deep learning has exceeded than that of traditional algorithm. Chen et al. 6 used the block detection method to preprocess the video fast on forest fire image, which greatly reduced the running time of the whole system. This method uses graphics processing unit (GPU) to accelerate the texture analysis and improves the real-time performance of the system. Fu et al. 7 designed different network models to identify the forest fire image in the case of different forest fire backgrounds at night and during the day, and mainly analyze the performance of the network under different parameters. The experimental results show that the CNN-based forest fire flame recognition method has prominent advantages compared with the traditional image processing-based flame detection method. Frizzi et al. 8 proposed a CNN method for detecting fires in video. Tested on real video sequences, the proposed method achieves better classification performance than other related traditional video forest fire detection methods. Hohberg 9 applied a convolutional 3D (C3D) network that can extract temporal features to forest fire detection. The experimental results show that trained networks distinguish wildfires from other objects clearly.
These scholars have made certain contributions to the identification of forest fires and achieved good recognition results. However, in these studies, most of the original images were directly input to the CNN to accomplish the training. Due to the complex background and various forms of the original forest fire image, these non-flame features will be fully learned during the training process, which makes the network generalization performance very poor. Due to the above factors, conventional image processing techniques and CNN methods are combined in this paper. Firstly, the candidate flame area of fire is segmented, and then the CNN is used to complete the recognition of fire flames. In the pooling process of CNN, an adaptive pooling method is introduced.
Forest fire recognition based on CNN
Flow of the algorithm
At present, most of the applications in forest fire identification are directly applied to CNN on the original image set. Due to the complex background and a number of interference in the original image, the result of the training is not so good. Therefore, in this paper, a method is proposed to segment the candidate flame region based on the color feature, and then part of the image is sent to the CNN network for training, which can extract features more specifically and improve the recognition rate of forest fire image effectively. The algorithm flow is shown in Figure 1.

Flow chart of forest fire image recognition algorithm based on CNN.
In the training phase, firstly, the binary image of the suspected flame region is segmented, and the result obtained by performing AND operation between the binary image and the original image is used as a training set, and a label is set for each image. A network model is obtained after training the CNN according to the training set. In the testing phase, similarly, the binary image of the suspected flame region is firstly segmented, and the result obtained by performing AND operation with the original image is used as a testing set. The testing set image is sent to the trained network model to obtain the recognition result.
Segmentation of suspected flame area
Color is a very important feature of fire image. In this paper, the flame pixel is segmented by the feature in RGB color space and YCbCr color space
10
(the third problem of review 1). In order to analyze the characteristics of the flame pixel in RGB color space, the number of pixels is plotted on the abscissa, and the coordinate system is established with the pixel value as the ordinate, and the flame pixels on the

Distribution of flame pixel values on channels
From Figure 2, the rule of the flame pixel in RGB color space as shown in equation (1) can be found
Similarly, the characteristics of the flame pixel in YCbCr color space can be analyzed, as shown in Figure 3.

Characteristic distribution of flame pixels in YCbCr color space. (a) Distribution of flame pixel values on
From Figure 3(a), the rule for the flame pixel in YCbCr color space as shown in equation (2) can be found
In this paper, the flame image is analyzed pixel by pixel according to the feature information in formulas (1) and (2) to implement the segmentation of candidate flame region. The flowchart is shown in Figure 4 (the first problem of reviewer 2).

Flow chart of fire image segmentation based on color feature.
Selection and structure of the model
In this paper, a CNN model is designed based on the AlexNet to recognize flame area. Specifically, it consists of three convolution layers, three pooling layers, two fully connected layers, plus one input layer and output layer. The network is shown in Figure 5, and the network structure is shown in Table 1.

Convolutional neural network structure of fire recognition.
Structure of the network.
CNN based on adaptive pooling (the first problem of reviewer 1)
The traditional pooling methods are maximum pooling
11
and mean pooling,
12
and the formula is given as in equations (3) and (4)

Max pooling. (a) 4 × 4 feature map. (b) Result of max pooling.

Average pooling. (a) 4 × 4 feature map. (b) Result of average pooling.
It is undeniable that these two pooling methods do have great advantage in terms of calculation. But for some extreme case, it is not very optimistic. The results of a pooled domain after maximum pooling and averaging pooling are shown in Figures 8 and 9.


Extreme max pooling effect. (a) 4 × 4 feature map. (b) Result of max pooling.

Extreme average pooling effect. (a) 4 × 4 feature map. (b) Result of max pooling.
It can be seen that the results based on traditional pooling cannot represent all the information of the original feature map accurately. Therefore, Liu et al.
13
proposed an optimization algorithm based on the average pooling and maximum pooling, in which both of the two traditional pooling models are considered, and its expression is shown in equation (5)
In this model, the feature extraction process can be optimized and the performance of the network can be improved. However, in the process of network training, various pooling domains will appear. For traditional CNN network, a fixed pooling model is adopted to extract features, which will cause loss of local information of the image undoubtedly. In view of this problem, an adaptive pooling based on the median pooling is proposed in this paper. Feature extraction based on the information of pooled domain can be performed dynamically. Its expression is as follows
In equation (6), two pooling factors
Both the mean pooling and the maximum pooling is considered in the design of the adaptive pooling model, and the pooling method can be selected dynamically. Furthermore, the loss of image information caused by the traditional pooling method in the feature extraction process is avoided.
Experiment results and analysis
The performance of CNN is affected by some hyperparameters, including initial learning rate, batch size, pooling, and so on. The performance of CNN from the above aspects is analyzed.
Impact of learning rate on network performance
The cost function
Impact of initial learning rate on network performance.
When the initial learning rate is 1 and 0.1, the loss value explodes directly. Although the network converges quickly, the final training result is poor, and the accuracy in the verification data set is 0, and all the verification images are classified into the same category. When the initial learning rate is set too small, such as 0.0001, the network still does not converge at 2000 epoch, which shows that the convergence speed is slow. When the initial learning rate is 0.01 or 0.001, the network can converge more quickly, and the accuracy can reach 100% during training process, and has good generalization ability. Both time and performance are taken into consideration, then the initial learning rate is preferably set between 0.001 and 0.01.
Impact of batch size on network performance
In training process, batch processing technology is often used, and the size of the batch determines the optimization degree and speed of the model. Different batch sizes are used to train 500 flame and non-flame data sets on the network structure, as shown in Table 1. The data analysis of the training results is shown in Table 3.
Training results under different batch size.
From Figure 3, combing the final loss of training and the score of testing set, the batch size is selected as 64 for subsequent experiment.
Impact of pooling model on network performance
In order to verify the impact of adaptive pooling on the performance of CNN, the structure in Table 1 is adopted. The batch size is set to 64, the learning rate is set to 0.005, and iteration time is set to 200. The average pooling, maximum pooling, and adaptive pooling modes are shown in Figure 10, respectively.

Training results in different pooling modes. (a) Average pooling. (b) Max pooling. (c) Adaptive pooling.
It can be observed from Figure 10 that after the 25th training, the accuracy of the adaptive pooling mode will remain unchanged at 1.0, and the training effect is slightly better than that of average pooling and maximum pooling. Testing accuracy in each pooling mode is compared to illustrate the superiority. Sixteen frames of non-fire image and 16 frames of fire image are adopted to form a testing set and average value is taken from three tests. It is shown in Table 4.
Test accuracy under different pooling modes.
It can be seen from Table 4 that the testing accuracy of adaptive pooling is higher than that of mean and maximum pooling; thus, the feasibility of adaptive pooling is verified.
Feasibility verification of the proposed algorithm
In order to verify the feasibility, the features in the middle layer of CNN are extracted. Figure 11 shows the feature map in hidden layer during training based on original image.

CNN hidden layer features of original fire image. (a) Original image. (b) Feature map with C-1. (c) Feature map with S-2. (d) Feature map with C-3. (e) Feature map with S-4.
The feature map in CNN hidden layer where flame region is segmented before training is shown in Figure 12.

Fire image features of CNN hidden layer combined with color feature. (a) Original image. (b) Feature map with C-1. (c) Feature map with S-2. (d) Feature map with C-3. (e) Feature map with S-4.
In Figure 11, since there are a lot of trees in the background, this part is extracted as a main feature in the feature map. In Figure 12, only the fire area exists, so that features such as shape and texture can be observed clearly in the feature map of hidden layer (the first problem of reviewer 1).
In order to verify the algorithm, based on the structure in Table 1, the training samples are only involved in the flame region. The results are shown in Table 5.
Comparison of fire recognition rate under different training samples.
It is obvious from Table 5 that the recognition rate is only 84.2% and 81.6% based on the training of original image. If the segmented fire area image is adopted as training sample, and the original image is taken as testing sample, the accuracy is as low as 80.4%. This is because there is only the features of fire without background information. The image after segmentation is applied to training sample and testing sample, and the accuracy can be as high as 90.7%. Therefore, it is further proved that the CNN fire recognition algorithm combined with color features has higher accuracy.
Conclusion
In this paper, the process of the forest fire image recognition algorithm based on CNN is presented. Its main feature is that the flame image is employed for training and testing. Then, AlexNet model is introduced, and an adaptive pooling method combined with color features is proposed for the problem that the traditional pooling method in CNN may weaken the image features in some cases. The effects of learning rate, batch size, and other parameters on the performance of CNN are analyzed based on experiments, and the optimal parameters are determined. Candidate flame area is extracted based on color feature; thus, the image feature of non-flame area in the hidden layer is reduced, and the feature, such as shape and texture, is enhanced. The information loss of image are avoided as adaptive pooling is adopted, and the rate of flame recognition in which fire area is segmentation than that of original image is adopted without segmentation. It is shown that the proposed algorithm has high recognition rate and is feasible. In this paper, the pooling of CNN is modified and applied on forest image recognition, recognition rate and consuming time will be developed deeply and compared with other algorithms in future.
Footnotes
Acknowledgements
This work (grant no. 61703329) was supported by the National Natural Science Foundation of China and the National Key Research and Development Program of Shaanxi Province, China (2019KW-046). The pictures were from Corsica fire Database which was made by the University of Corsica.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work (grant no. 61703329) was supported by the National Natural Science Foundation of China and the National Key Research and Development Program of Shaanxi Province, China (2019KW-046). The pictures were from Corsica fire Database which was made by the University of Corsica.
