Abstract
The existing equipment of civil aircraft cargo fire detection mainly uses photoelectric smoke detectors, which has a high false alarm rate. According to Federal Aviation Agency’s (FAA) statistics, the false alarm rate is as high as 99%. 1 In the cargo of civil aircraft, the traditional photoelectric detection technology cannot effectively distinguish interference particles from smoke particles. Since the video smoke detection technology has proven to be reliable in many large scenarios, a deep learning method of image processing for fire detection is proposed. The proposed convolutional neural network is constructed of front end network and back end network cascaded with the capsule network and the circularity computation for the dynamic infrared fire image texture extraction. In order to accurately identify whether there is a fire in the area and give the kind of burning substances, a series of fuels are selected, such as n-heptane, cyclohexane, and carton for combustion reaction, and infrared camera is used to take infrared images of all fuel combustion. Experimental results show that the proposed method can effectively detect fire at the early stage of fire which is applicable for fire detection in civil aircraft cargoes.
Introduction
Traditional fire detection sensors are inexpensive and easy to use, detecting fires by sampling smoke particles, atmospheric temperature, and relative humidity. However, these sensors cannot accurately distinguish between combustion particles and interference particles, such as dust, haze and feather lamps, which can easily cause false alarms in the detection system. 2 In addition, these types of point sensors do not provide the actual location and size of the fire. In contrast, cameras can be used to overcome the above limitations by monitoring the amount of fire and its size and growth rate. 3 Therefore, video surveillance cameras that are widely used in security applications can be used in open space fire monitoring systems. Detecting combustion particles can alert you to fire hazards. Various methods have been proposed in recent years in order to effectively detect fires in a video sequence, making it suitable for real-time applications.4–6 One of the main challenges of fire detection is the extraction of the combustion process because of its many characteristics, such as color, shadow, motion, and density, and can vary with the type of combustion material and the environment. For this reason, feature extraction has been extensively studied in the literature.
When a fire occurs, the flame will show different image change characteristics along with the fire. How to properly image the flame image at different times so that the shape of the flame can be properly represented is a hot spot experiment of researchers at home and abroad. Zou et al. 7 used image fusion technology to extract the characteristic parameters in the infrared image. Through the system development design, the information matching of the feature parameters can be used to judge whether the infrared image could be automatically recognized and accurately detected, but the system development design is too cumbersome, applicable to actual conditions. Lin et al. 8 used image processing techniques to obtain the image's characteristic parameters with 3D convolution network, aimed to determine the characteristics of the flame and the occurrence of fire. Since the lack of experimental verification and acquisitions, it does not have strong persuasive power. Some researchers9–11 use experimental method to describe the smoldering phenomenon of infrared images. However, this is only for the situation described in the initial stage of combustion and does not study the whole process of combustion, so that the research method needs to be improved. Yang 12 used the HSI method to segment the flame image to obtain the location of the ignition point of the flame image. However, a series of influencing factors such as background interference caused by the algorithm of the color space may also cause the inaccurate segmentation image. Wang et al. 13 used experimental methods to extract the flame height. Although it effectively avoided the subjective speculation of image processing by human factors, the experimental fuel selected in the experiment did not cover the fire situation in most cases, so the results were not comprehensive. Zhao et al. 14 used DSP technology to extract the infrared image features, and finally, the Sobel algorithm was used to calculate the image feature matrix and match the picture. However, there is always signal noise interference in the image processing mode. Luan et al. 15 used the BP neural network method to deeply study the color and texture features of video images in order to obtain more accurate flame features but ignored the key parameters such as learning coefficients and thresholds in BP neural network. The existence of errors makes deep learning show inaccurate properties. Tan et al. 16 adopted an improved probabilistic neural network method to fuse the colors in the flame and finally extracted the accurate flame image contour for training test, although the improved probabilistic neural network model eliminated to some extent human error, insufficient sample size for training tests will also lead to inaccurate results. Literature 17 focuses on the noise interference problem in the image and proposes an adaptive median filtering method to remove noise of the image. The literature 18 only discussed the contrast enhancement of infrared image, and put forward to the algorithm theory based on the HP memristor characteristics and combined with the experimental results, the rationality of the algorithm was given. However, both of them only analyze the characteristics of image enhancement in detail and did not express other aspects of image processing which seemed that the article is not comprehensive enough. Wang et al. 19 constructed a spiking neural network, using the neural network algorithm to detect the edge of the infrared image and verify its effectiveness experimentally, but only the unilateral discussion of image processing could not be comprehensively solved problem. Qiu et al. 20 used the rough fuzzy set theory to locate the infrared image, integrated the LBP model into the GMM to obtain the target extraction of the natural image, and finally merged the motion regions of the two and then used the CV model to accurately extract the target. Because the final theoretical model does not give proof of the validity of the experiment, the article is lacking in beauty. Even if researchers at home and abroad have proposed many solutions to the problem of infrared image processing, but there are problems of large or small, we need to find another way to properly process the infrared image of the flame. Jia et al. 21 used color and motion characteristics to perform saliency map calculations for detecting fire. In various methods, the neural network algorithm can achieve a rapid and accurate extraction of fire images’ characteristics. However, convolutional neural networks require a large amount of data which make it computationally expensive. Literature22,23 aimed at various characteristic parameters of the flame image in different scenarios. The general method of video flame image processing was obtained by judging the flame image and by the nature of the characteristic parameters. Only the simulation methods, if not using the experimental method will increase the parameter error extracted by the method. In recent years, the architecture of convolutional neural networks has performed well in many visual processing tasks, including image and video classification,24–27 object and face detection, 28 crowd analysis, speech recognition, etc. The remarkable achievements of convolutional neural networks in visual processing have largely contributed to their outstanding abstraction capabilities. Therefore, it is reasonable to apply its powerful functions to the specific tasks of fire detection.
In this paper, infrared images of three kinds of combustion materials are obtained experimentally, and then the proposed cascaded method is used to determine the fire. For comparison, the temperature varying curves of the three categories of fuels is measured by thermocouples and compared with the infrared detection results.
Materials and methods
Inspired by the great success of convolutional neural networks in a variety of computer vision tasks, Tao et al. first applied AlexNet 29 to video smoke detection and achieved state-of-the-art performance, which automatically learned texture from a single frame. The features representation is implemented more easily, but it cannot capture motion information between frames. Therefore, if we design a deep architecture that can extract motion features, it will help improve the network identification characteristics. To realize this demand, in this paper, combined with the abovementioned literature convolutional neural network model can automatically extract the characteristics of features, and inspired by the Viola-Jones method 30 cascaded ‘weak’ classifier ideas, we propose to use two-tier cascade convolutional neural networks for fire detection.
Network construction
The convolutional neural networks with capsule networks are used for network construction in this paper whose simplified network structure is shown in Figure 1. For a feature map of size m*n, convolution operation is performed with a convolution kernel of size k*k, and the output of the convolutional layer is

The structure of the C_CNN network.
Because the Linear Correction Unit (ReLU) function learns faster and can effectively avoid the saturation problem that may occur during the training process problem, so this article uses the ReLU function as the activation function
The back-end network structure is more complex than the front-end network. The convolution kernels of different convolutional layers in the neural network are different in size. In the back-end network, there are one input layer, four convolution layers, one pooling layer, one fully connected layer, and one output layer. After the convolution pooling operation, the result is input to the fully connected layer, and the nonmaximum suppression algorithm is used to discriminate the output optimal detection window to realize the fire detection process.
The cascaded neural network
The capsule network is a high-performance deep learning network structure as shown in Figure 2. In the structure, each feature is represented by a vector. The network model corresponding to the structure is trained to extract image features. Each feature of the capsule is a vector, which is progressively clustered to combine features.

The framework of the capsule network.
In order to make the acquired features have strong expression and description ability, a vector neuron is used to store the pose of the target, and the target is made by calculating the probability of a certain feature point on the target image at a certain position of the image.
In Figure 3, the input vector v is processed into a new input vector by matrix W

The structure of the vector neurons.
Thus, the clustering centers of the n layers of the previous one can be obtained. The metrics of the cluster are obtained by finding the normalized inner product.
Experiments with three categories of fuels
In this paper, the American Fluke series TIS10 was selected as the equipment for collecting infrared images. The 8-cm-sized n-heptane and cyclohexane liquid fuels and the 13 × 8 × 4.5 cm carton were burned in the 1.3 × 1 × 1.6 m experimental chamber. The test was repeated three times for each fuel. The fuel specific parameters are shown in Table 1. The flow chart of the algorithm proposed in this paper is shown in Figure 4.
Experimental parameters of fuel.

Flow chart of the proposed algorithm.
The combustion fuels in Table 1 are n-heptane, cyclohexane, and carton, respectively. The radius of the pool is 4 cm and the heights are 10 cm of the cylindrical container. The length of the plank of tinned paper is 150 cm. First, an electronic balance is used to fix the burning quality of each fuel, and then three kinds of fuels are ignited using an ignition device, and a flame infrared image of the fuel is taken separately before the infrared device is stood at 70 cm in the experimental chamber using a bracket as shown in Figure 5.

(a) n-Heptane infrared image; (b) cyclohexane infrared image; and (c) carton infrared image.
The burning time was recorded using a mobile phone device during the fuel combustion process, and the temperature distribution fields of the three fuels were tested by using five thermocouples each separated for a distance of 5 cm and recorded using a paperless analyzer for subsequent data analysis. Finally, the infrared image captured was imported into the computer, and fed into the constructed cascaded neural networks.
Since the infrared image is taken under external interference conditions, the image enhancement is performed before in order to diminish the background noise. The infrared image of the three fuels is first gray-scale processed, and then the binary processing is performed as shown in Figure 6. The edge detection method for motion features is used to draw their outlines.

The binary pattern and edge detection diagram of the three combustion fuels: (a) n-heptane; (b) cyclohexane; and (c) carton.
Different fuels have different threshold response ranges. In order to solve the noise interference problem of infrared images during shooting, the threshold interval of different fuels must first be calculated. The noise filtering method is selected according to different threshold response ranges of the fuel to achieve the effect of image processing. The histogram threshold method is used to solve the threshold interval of the three categories of the fuels as shown in Figure 7.

The threshold histograms of (a) n-heptane; (b) cyclohexane; and (c) carton.

The training process of the network.
It can be seen from Figure 6 that the threshold of n-heptane fuel is concentrated at 20 threshold points, and the probability of occurrence of this point is basically 0.25; the threshold concentration of cyclohexane fuel is also at the 25 threshold point, but the probability of occurrence of this point is 0.28. The threshold of the carton fuel is concentrated at the 0 threshold point, and the probability of occurrence at this point is basically 0.38. In Figure 8, we know when the number of training increases, the loss value decrease and the steady-state will be achieved.
Thus, It is known that n-heptane and cyclohexane have basically the same threshold response range but the probability of occurrence is different, and the threshold of the solid fuel carton and the two liquid fuels are very different, and the probability interval occurs. It is also very different, which results in a gap in the area of the infrared image of different fuels. Second, it can be seen from the three figures that the thresholds of the three combustion products have a highest point, and the distribution pattern of the threshold image basically satisfies the normal distribution.
Data training and analysis
The parameter setting of the two-level convolutional neural network will directly affect the detection accuracy and detection speed of the model. The front-end network model should not be too complicated, which will lead to a decrease in the speed of the pre-judgment, thus affecting the overall fire detection speed. The back-end network model should not be oversimplified to avoid a drop in the accuracy of fire detection. Therefore, in the neural network cascade model, the setting of network parameters is very important. During the training, the cascaded convolutional nerual networks (CNN) model was trained using the dataset of the State Key Laboratory of Fire Science. The dataset contains 5000 fire images of different lighting environments, flame types, and backgrounds. The experiment uses the data to train the capsule network. The feature extracting process is simply normalized, and the dimensions of the capsule network are changed with experiments. The first layer of the capsule network portion is the input layer, and the input image has a size of 10 × 10. The second layer uses a convolution kernel size of 2 × 2 for the convolutional layer and a VALID-type convolution of step size 1. This layer converts the pixel intensity into local feature detection information. The third layer performs a VALID type convolution with a step size of 2, which combines the results into a capsule vector with a dimension of 8. The fourth layer produces a total of six capsules, each of which is a vector with a dimension of 16, and the third layer outputs a transformation matrix and a dynamic routing process to obtain a fourth layer of output. The loss value obtained after the network is trained by parameters is shown in Figure 7. After 15,000 iterations, the loss drops to around 0.02 and eventually stabilizes at 0.01.
Flame is one of the obvious features of fire. Extracting the characteristic values in the infrared image is helpful to judge the occurrence of fire. The characteristics of infrared images are generally divided into two types: static features and dynamic features. Dynamic features include circularity, area change rate, etc., while static features include temperature, etc.
26
Due to the rapid fire and other conditions, the flame will quickly spread to various locations around it, causing fires in other places, which will expand the surrounding fire and increase the fire area. The dynamic characteristics of the circularity changes are discussed of the flame image in this paper. The fire circularity of the fuel without fire can be compared with the circularity of the fire to quickly find out whether there is a fire in the area. According to the literature,
27
the formula for the circularity is

Flameless infrared images of the three fuels: (a) n-heptane; (b) cyclohexane; and (c) carton.
Simulation results of the three fuels with and without flame circularity.
It can be seen that the circularity of different fuels is greater than the circularity of the fuel without flame, indicating that there is a fire in this area. When radiated with strong heat, and the amount of heat radiation of different substances is completely different. In this paper, the static characteristic of temperature is selected, and the temperature distribution of each fuel is measured by five thermocouples to calculate the time fitting curve of the fuel temperature. Since the infrared image can only detect a certain position of the fuel at a certain moment, equation (6) is used for calculating the average temperature, where B represents the average temperature, and f (T, t) represents the temperature as a function of time t with the T duration. It can replace the temperature of the point to calculate the amount of thermal radiation
Results
To verify the validity of the algorithm, the experimental indicators are
The cascaded CNN algorithm is implemented in the MATLAB experimental environment. In order to verify the effectiveness of the fire detection problem based on the model, this paper compares the cascaded model with the current image detection algorithm from multiple aspects such as the detection rate, false detection rate, missed detection, rate and detection time t. The algorithm of this paper is compared with other CNN based network such as the Alex-Net network and the VGG-16 network. The performances of these methods are comparatively analyzed, and the results of the algorithm comparison are shown in Table 3. The C_CNN denotes the cascaded network proposed in this paper.
Performance comparison of the three algorithms.
In the fire detection research, the detection rate is related to the fire discovery, and the fire safety control is closely related. Thus, the detection rate is an important indicator to measure the pros and cons of an algorithm. It can be seen from the results that the detection rate of the C_CNN method is 95.3%, which is 8.6% higher than the Alex-Net method and 1.7% higher than the VGG-16 method. As to the false detection rate, the proposed method is as low as 5.6%, which is 2.9% lower than the Alex-Net method and 1.7% lower than the VGG-16 method. The detection efficiency of the cascaded CNN model is higher than that of the Alex-Net and VGG-16 network. The method proposed in this paper can also reduce the false detection rate in nonfire areas. Especially for the outdoor scenes, the method of this paper effectively avoids the occurrence of false alarms. The experimental results of the first alarm frame number of videos are shown in Table 4.
Comparison of the frame detected.
The detected frame of this method is lower than the Alex-Net network and the VGG-16 network, indicating that the method can detect the fire incident in the video and detect the damage caused by the fire earlier. In summary, the cascaded CNN model proposed in this paper has good effect on fire detection, and it has improved in detection rate, false detection rate, and detection speed, which embodies the advantages of the algorithm. The fine-tuned C_CNN network can be implemented on single chip microcomputer, computer visualization field, and other application scenarios in practice. In aircraft cargo fire detection, we use the network to identify the infrared images and the first alarm frame has the good fitting effect with the thermocouple detection curves as discussed below.
We use the consequence of the frame detected to compare with the temperature change of the three combustion fuels. The data of temperature changes of the three categories of fuels are plotted and curve fitted with CFTools of MATLAB tools as shown below. The temperature data of these three combustion fuel were processed by curve fitting method and the results are shown in Figures 10 and 11. The average temperature of the three combustion fuels of the thermocouple is shown in Tables 5–7.

Curve fitting results of the n-heptane and cyclohexane fuel.

Curve fitting results of the carton.
Fuel temperature of the n-heptane.
Fuel temperature of the cyclohexane.
Fuel temperature of the carton.
From the temperature distribution field measured by the five thermocouples in Table 5–7, it can be concluded that the temperature distribution of the two liquid fuels increases first and then decreases, while the temperature distribution of the solid fuel exhibits the decreasing characteristics. The location of the sudden change in temperature corresponds to the position of the first frame in which the flame is detected. The fuel heat radiation value calculated by the seven intermediates according to the temperature of various substances can be used as a basis for judging which kind of fuel is burned at the time of fire.
Discussion
Based on the traditional CNN neural network architecture, this paper proposes a two-layer concatenated convolutional neural network structure model. The two neural network structures are used to quickly identify suspicious fire blocks and further extract their depth features. The front-end convolutional neural network is simple in design, quickly eliminates a large amount of background information and improves detection efficiency. The back-end convolutional neural network further extracts complex flame features and outputs optimal detection results, which ensures the reliability and detection accuracy of the model. Compared with other algorithms, the proposed method is superior to Adaboost method and Fast LBP method in detection rate and detection speed and has good robustness. The GPU can be used to accelerate the algorithm, and its running speed can meet the needs of real-time fire detection. In future research, the cascading model structure will be further optimized to try to detect the area of flames more accurately.
According to the simulation of infrared images of different fuels by MATLAB software and the measurement of the temperature distribution of different fuels by thermocouples, the following conclusions can be drawn:
The circularity of different fuels is greater than the circularity of the material when it is not burning, indicating that a fire phenomenon has occurred in the area. From the measured temperature distribution of different thermocouples, the temperature of the liquid fuel increases first and then decreases, while the temperature distribution of the solid fuel is the regular characteristic of decreasing. Through the measurement of the temperature distribution field, it is found that the thermal radiation values of different material decreased from n-heptane fuel to carton fuel, which can be used as a basis for judging the material combustion in the region. It is helpful to improve the fire-fighting facilities and methods to maximize the efficiency of fire emergency rescue by finding this law.
Therefore, the dual methods of simulation and experimental measurement can be used to judge the combustion of different kinds of substances and the burning phenomenon caused by substances and provide reference for fire protection work.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: National Key R&D Program of China (No. 2018YFC0809500), the National Natural Science Foundation of China (Grant No. U1633203, U1733126) and Sichuan Science and Technology Program (No. 2018GZYZF0069).
