Sage Journals: Discover world-class research

Abstract

Railroads are one of society’s fundamental infrastructures, facilitating the transportation of passengers and goods over vast distances. Rail status data is immensely important for ensuring the safe and efficient operation of railroad networks. However, analyzing ultrasonic inspection data is a labor-intensive process and relies heavily on the expertise of experienced inspectors. To detect internal defects of the rail accurately and automatically, this paper proposes a customized image recognition method based on a convolutional neural network with limited B-scan rail image data collected within the industry. The proposed method uses EfficientNet-b7 as the backbone network to fully extract the B-scan rail image data features. With the help of transfer learning and data augmentation techniques, the backbone network is substantially enhanced so that it can understand high-level features of the object without being trained with large-scale B-scan image data. We establish a real-world internal rail defect dataset with 280 B-scan images and test our proposed method. The results reveal that the highest accuracy of the other mainstream CNN-based methods is 76.25% and the accuracy of the traditional method based on a support vector machine classifier trained with Tamura texture and LBP features is 60.00%. Our proposed EfficientNet-b7 model classifies rail defect B-scan images with an accuracy of 85.00%, precision of 84.71%, and recall of 85.00%. Compared to other rail internal defect detection methods, this method is more accurate. With the help of transfer learning and data augmentation, our proposed method achieves better performance and requires less data.

Keywords

Rail defect ultrasonic imaging data machine learning computer vision artificial intelligence

Introduction

Rail will gradually deteriorate due to exposure to the natural environment and the dynamic load of rolling stocks. At a certain point of deterioration, rail breaks or cracks. Broken rails is the leading cause of major derailments and hazardous material release accidents.¹ From 2011 to 2021, there were an average of 20 mainline broken-rail derailments per year on U.S. Class I railroads.² According to the Association of American Railroads, over $24.8 billion was spent on U.S. freight railroad maintenance and capital expenditures in 2017.³ Between 2000 and 2017, around 6,450 freight-train derailments occurred, leading to $2.5 billion worth of rolling stock damage and infrastructure loss.^4,5 Broken rails is also the leading cause of derailments for European Union (EU) countries,^6,7 and the maintenance costs for the 27 EU countries for repairing broken rails are around €2 billion per year.

Train accident data from the Federal Railway Administration of the United States shows that in 2022, 359 train derailment incidents occurred due to rail defects, while track defects were the second most common cause of train derailments that year. In recent years, requirements for rail inspection have considerably increased due to the increasing mileage, traffic, and train speeds.^8,9 In India, rail passenger traffic has increased by almost 200% and freight traffic has increased by 150% since 2000.⁸ Moreover, fewer train operational delays related to rail defect inspection could reduce operational costs. Over $23 billion per year on average was spent on capital expenditures and maintenance expenses related to America’s freight railroads.¹⁰ Thus, a more accurate and efficient rail inspection system is needed to meet the rising safety standards and reduce costs, leading to huge potential yearly savings for railroad inspection and operation.

In the United States, Sperry developed an automatic rail flaw detector car to inspect surfaces and internal rail defects. In internal rail defect inspection, railroad companies utilize ultrasonic sensors to generate ultrasonic echo at various angles to detect internal sections of the rail¹¹ and output B-scan images. B-scan image analysis is the most widely used method for rail inspection. However, such analysis is time-consuming and depends on experienced analysts, which makes large-scale rail defect inspection difficult, amongst other problems. Due to a lack of intelligence, conducting inspections and reporting together in rail defect ultrasonic detection remains challenging.

Literature review

In the past decade, deep learning-based detection and segmentation methods have been successfully applied to many fields of civil engineering.^12–16 Cha^12,13 designed deep learning-based crack damage detection using convolutional neural networks and constructed an autonomous structural visual inspection tool, which uses region-based deep learning to detect multiple damage types.

Researchers have also used thermography and other more sophisticated imaging tools to detect internal defects via deep learning-based detection and segmentation methods,^17,18 and Rahmat used thermography to segment the internal damage of concrete based on an attention-based generative adversarial network and detected subsurface damage of a steel bridge using deep learning and an uncooled micro-bolometer. For pixel-level segmentation, Kang^19,20 developed a semantic transformer representation network and a faster region proposal convolutional neural network for concrete crack segmentation. Choi²¹ proposed a semantic damage detection network for the real-time segmentation of superficial cracks in structures.

In railroad engineering currently, piezoelectric ultrasonic detection is mainly used to detect internal rail defects. The detection process involves specialized equipment, including walking sticks, a dual-track rail flaw detector, and a rail flaw detection vehicle. In accordance with industry standards, the utilization of ultrasonic detection to identify internal rail defects allows for the determination of basic information such as the size and location of the damage primarily through the analysis of B-scan images. However, the B-scan images of internal rail defects are significantly different from the other internal damage. The quality of B-scan images depends on manual operation, so the methods used in other fields may not be suitable to solve the problems studied in this paper. The traditional method extracts ultrasonic image features by manually designing a rail ultrasonic image data feature extractor and then classifying rail defects using machine learning classifiers.^22–24 Researchers have attempted to customize feature extraction of B-scan images and integrate them with machine learning techniques to classify rail defects. Wu²² extracted the color, contour, and other features in the algorithm and put them into the perceptron model for rail defect classification. Huang²³ proposed a B-scan image rail defect algorithm to extract image features based on the Tamura texture feature and local binary patterns (LBP), and to classify rail defects with a support vector machine (SVM). These methods could achieve high accuracy in rail defect classification but would require experienced data scientists to design specific image feature extractors and classifiers, which might lead to low robustness and low generalizability for unseen data.

Recently, convolutional neural networks (CNN) have been introduced to achieve higher rail defect detection accuracy. Sun²⁵ designed a novel multi-stage model based on AlexNet and the DBSCAN method to cut rail B-scan images and classify rail defects. Wang²⁶ proposed that MFDet can conduct multi-scale feature fusion for classifying rail defects based on Faster R-CNN. Chen et al.²⁷ proposed an improved YOLO (you only look once) V3 model to detect internal rail defects using B-scan images. Hu et al.²⁸ utilized ResNet-50 training with transfer learning to classify internal rail defects. However, these methods still require massive data to obtain CNN models that perform well. In the industry, B-scan image data of rail defects is hard to obtain because of rail defects’ rarity and confidentiality. Thus, these methods may underperform other non-CNN methods with limited data. Table 1 summarizes the data requirements and corresponding performances of traditional and CNN-based methods.

Table 1.

Summary of traditional methods and CNN-based methods.

Method	Research	Defect class	Approach	Data volume	Performance
Traditional method	Wu et al²²	6	Perceptron classification with image traits and parameter threshold adjustment method	920	Average accuracy 97%
	Huang et al²³	5	SVM classification with tamura & LBP features	250	Accuracy = 99%
	Xu et al²⁴	2	BP neural network with B-scan data	5,000	Accuracy ≥ 95%
CNN-based method	Sun²⁵	5	A multi-stage model based on AlexNet and DBSCAN	300,000	Detection rate ≥ 80%
	Wang²⁶	8	MCDet and MFDet	10,000	Accuracy ≥ 87.68%
	Chen et al²⁷	4	Improved YOLO V3	4,500	mAP = 87.41%
	Hu et al²⁸	5	ResNet-50 with transfer learning	5,000	Accuracy = 99.3%

Contributions

Developing and customizing a deep learning model for internal rail defect classification based on B-scan images faces several challenges. The novelty of this study is that it provides a clear, big picture of the field, and an implementable and generalized solution for these challenges. Currently, a comprehensive investigation of B-scan images for internal rail defects is missing. This study reviews current nondestructive rail diagnostics technology, common internal rail defects, and corresponding B-scan image characteristics.

The existing deep learning methodologies for internal rail defect detection can achieve acceptable performance with a large volume of high-quality data, but risk severe overfitting. Collecting high-quality rare data of negative class, such as B-scan images of internal defects in rails, is challenging and expensive. In this research, we collected four types and up to 280 rail internal defect B-scan images from the industry, which were obtained by dual-rail ultrasonic flaw detectors. To cope with this small industry dataset, we customized and modified dense layers of the EfficientNet-b7 model and then used transfer learning to optimize the model, which substantially enhanced its capacity to understand high-level image traits and fit for the downstream task of rail defect detection in B-scan images. This method allows a model trained on a customized small dataset to achieve great performance.

It is especially hard to gather enough images of all the types of internal rail defects for model training. The low number of negative samples will lead to unbalanced dataset and degrade model. Therefore, we developed a novel internal rail defect classification algorithm based on the modified EfficientNet-b7 and data augmentation, with the aim of classifying internal rail defects on the B-scan images without requiring too much training data, achieving better accuracy. The experiment’s results indicate that in rail ultrasonic data analytics, even with limited and unbalanced industry data, a promising CNN-based internal rail defect detection model can be established with the help of a transfer learning technique and data augmentation.

Material and methods

Material

The B-scan images of common internal rail defects and their characteristics

Internal rail defects are defects that develop within the rail. These include rail transverse defects, bolt hole defects, rail bottom cracks, rail fillet transverse cracks, and more.²⁹ According to the ultrasound echo position, the primary types of rail defects are rail transverse defect, bolt hole defect, horizontal crack, vertical crack, oblique crack, and bottom crack. In our research, over 90% of collected B-scan defect data are rail transverse defects and bolt hole defects, as shown in Table 2. Thus, we classify common rail defects as rail transverse defect, bolt hole defect, and other defects. This section will summarize the characteristics of the B-scan images for the primary types of rail defects. The color of each ultrasound detector probe and the corresponding parameter settings of the B-scan are depicted in Figure 1. For instance, purple indicates channel A and the probe is anterolateral to the rail.

Table 2.

B-scan images used for training and testing.

Class	Train	Test
Rail transverse defect	97	43
Bolt hole defect	41	11
Other defects	22	14
Normal	40	12
Total	200	80

Figure 1.

Ultrasound detector probe color and corresponding parameters.

Rail transverse defect

Rail transverse defect is the primary internal rail defect. The size of the rail transverse defect will directly affect the strength of the rail and pose a threat to driving safety. Therefore, it is important to identify the rail transverse defect. In the B-scan image data we collected, rail transverse defects represent the most significant proportion. B-scan images of rail transverse defects are shown in Figure 2 (a). The echo appears near the rail head line. This type of defect is detected by anteromedial and posteromedial 70° probes, with three possible situations: (1) the anteromedial and posteromedial 70° probes appear to echo at the same time; (2) only the anteromedial 70° probe appears to echo; or (3) only the posteromedial 70° probe appears to echo. In summary, the rail transverse defect is detected by 70° probes and the echoes all appear near the rail head line.

Figure 2.

B-scan images of (a) rail transverse defect and (b) bolt hole defect.

Bolt hole defect

Bolt hole defects can be detected by the forward and backward 37° straight probes and the 0° probe. The echo appears near the bolt hole line that is situated in the middle section of the B-scan image. The bolt hole defect is detected when the forward 37° or backward 37° probe echo exceeds the lower boundary of the bolt hole. If the 0° probe echo is abnormal in the upper boundary or in the lower boundary of the bolt hole line, a bolt hole defect also appears. B-scan images of bolt hole defects are depicted in Figure 2 (b).

Other internal rail defects

Beyond rail transverse defects and bolt hole defects, other main internal rail defects include rail head horizontal crack, rail end horizontal crack, rail jaw horizontal crack, and rail bottom transverse crack, as shown in Figure 3.

Figure 3.

B-scan images of other rail defects: (a) rail head horizontal crack, (b) rail end horizontal crack, (c) rail jaw horizontal crack, and (d) rail bottom transverse crack.

Most defects appear near the bolt hole position and their detailed classification depends on the 0°, the forward 37°, and backward 37° probe echo. If the abnormal echoes appear near the head line, the defects will be classified as rail head horizontal crack or rail end horizontal crack. If the abnormal echoes appear near the bolt hole line, a rail jaw horizontal crack is indicated. If the bottom of the rail oscillates and abnormal echoes appear, a rail bottom transverse crack is detected.

Methods

A general CNN includes the data input layer, followed by the convolutional layer, pooling layer, fully connected layer, and the final output layer. CNN extracts the essential features of the input image through multi-convolutional computations for the subsequent decision. However, most features observed in B-scan images of rail defects exhibit high degrees of similarity and insignificance. Consequently, traditional CNN encounters challenges when extracting meaningful features for accurate classification of rail defects in B-scan images. Additionally, a well-trained CNN relies heavily on the availability of a large dataset, which is often limited in various scenarios, including medical image analysis. To address this issue, data augmentation is proposed to augment the dataset by incorporating geometric transformations, color changes, rotations, zooming, and other similar operations.³⁰ In this research, we enlarge our dataset during the model training process. In the traditional CNN training process, the input dataset will go through the model, and the model will update tremendous weights of convolutional layers and others by a backpropagation algorithm for better training accuracy. However, when applied to smaller datasets, the weight number of the CNN model is far larger than the number of training data instances. Overfitting will emerge quickly during the training process and degrade the model. Such a model will not fully understand the patterns and features of a small dataset. The transfer training technique is proposed³¹ to conduct domain adaptation for the CNN models and to ease the negative impacts of overfitting. With the transfer training technique, a CNN model can enhance understanding of high-level features and achieve acceptable training performance with a large dataset like ImageNet.³² Users can customize this pre-trained model for various tasks and fine-tune it on a given small dataset.

In this research, we selected EfficientNet as our backbone network, which is widely used in computer vision. Wang et al.³³ modified EfficientNet-b3 to detect lymph node metastases in breast cancer. Zhang et al.³⁴ proposed LightWeightTextureNet based on EfficientNet to examine ultrasonic images for material texture classification. These studies show that the application of EfficientNet to ultrasonic rail defect data is promising. Therefore, EfficientNet-b7 was selected to build the classification model for internal rail defects in B-scan images using transfer learning and data augmentation techniques.

EfficientNet

In the deep CNN model training process, network depth, width, and resolution settings are crucial. He et al. proposed ResNet, consisting of residual blocks, pooling layers, and dense layers.³⁵ Residual block, which is a stack of convolutional layers with multi-scale and other layers, allows ResNet to achieve incredible accuracy. This appropriate and deeper layer combination allows the CNN model to achieve better optimization faster. However, finding such a combination requires extensive computational resources and human efforts, which are not available with limited resources. Unlike the traditional deep CNN model, EfficientNet can scale the model with three dimensions (depth, width, and resolution) using a compound coefficient.^36,37 To find the compound coefficients, the general convolutional network N and the following function are defined. Additionally, the memory and flop limits are set for the convolutional network N. The typical N comprises multiple sub-networks. Here we define a sub-network as one stage of the N which is represented by X. In the function N, the stage will be repeated L_k times, where k denotes the stage. For each stage, we represent the input of the stage with CNN depth, width, and resolution under corresponding constraints. The goal is to maximize the accuracy with limited training resources.

\max_{d, w, r} A c c u r a c y (N (d, w, r)) s . t . N (d, w, r) = ⨀_{k = 1 \dots s} F_{k}^{d ∙ L_{k}} (X_{< r \cdot H_{k}, r \cdot W_{k}, r \cdot C_{k} >})

(1)

Where d,w,r are the coefficients of the scaling network: depth, width, and resolution. $F_{k}, L_{k}, H_{k}, W_{k}, C_{k}$ represent the preset parameters in the network.

D e p t h : d = α^{\emptyset}

W i d t h : w = β^{\emptyset}

R e s o l u t i o n : r = γ^{\emptyset}

s . t . {\begin{cases} P r e s e t_m e m o r y_l i m i t \geq M e m o r y (N) \\ P r e s e t_F L O P S_l i m i t \geq F L O P S (N) \end{cases}

s . t . {\begin{cases} α \cdot β^{2} \cdot γ^{2} \approx 2 \\ α \geq 1, β \geq 1, γ \geq 1 \end{cases}

(2)

To determine the compound coefficient, EfficientNet selected the mobile inverted bottleneck convolution (MBConv) module as the backbone network, which consists of a 1 × 1 convolution layer, a convolution layer with a larger stripe, and multi-batch normalization layers, with the stripe number of the middle convolution layer inside MBConv being 3 or 5. The structure of MBConv is depicted in Figure 4(a). EfficientNet-b7 is found using grid search and includes 55 MBConv modules, two convolution layers, one global average pooling layer, a fully connected layer, and a softmax classification layer. The network architecture of EfficientNet-b7 is depicted in Figure 4(b).

Figure 4.

The structure of (a) MBConv and (b) EfficientNet-b7.³⁶

Transfer learning and data augmentation

Current CNN-based networks require massive high-quality data to achieve satisfactory accuracy. Lab-constructed data can partially solve this problem and meet basic training needs. However, these models are often inaccurate when used in the real-world due to data bias. Further, collecting high-quality rare data of negative class is challenging and expensive in infrastructure industries. The problem of unbalanced data will emerge. Transfer learning aims to transfer knowledge from the source domain to the target domain.³⁰ With the help of transfer learning technique, CNN models can improve their model capacity, and fine-tune on the small and customized. Additionally, to make our dataset more balanced, we enlarge rare data of negative class in our dataset using the data augmentation technique during the training process. Data augmentation is a commonly used method to increase the amount of data for training data, which is used to prevent overfitting and improve the performance of deep learning models.³⁰ Data augmentation commonly used in computer vision includes cropping, flipping, rotation, scaling, etc. In our research, we select cropping, flipping, rotation, and scaling to enlarge and balance our dataset because this could retain most of the valuable information while introducing less bias. Overall, we introduce transfer learning and data augmentation techniques in our training process to ease the impact of lacking data and unbalanced data.

Our dataset contains 280 images, which is far below the number of EfficientNet-b7 weights, which is over 66 million. The EfficientNet-b7 would overfit if only trained with our small dataset, and the accuracy and generalization ability of the model would significantly degrade. To avoid such a situation, we first pre-trained the EfficientNet-b7 with ImageNet. After training with large dataset, the pre-trained EfficientNet-b7 achieved strong generalization ability and could fully extract and understand high-level image features. We froze the weights of the input and middle layers with this network and modified the final output dense layers to fit our task. Finally, the customized pre-trained network was fine-tuned on our dataset, and the ideal network was obtained after multiple trainings.

Experiments and discussion

This section first applies our proposed customized EfficientNet-b7 network and then compares it with other methods. The computational hardware resources include an Intel(R) Xeon(R) i7-7700K@ 4.20 GHz CPU, 32 GB of 3200 MHz memory, and an NVIDIA GTX 1080Ti GPU, and the software environments of our machine include Windows 10, Python 3.5.5, Opencv 4.0.0, Cuda 9.2, Cudnn 7.2.1, Pytorch 1.4.0, and other Python packages.

Dataset

All B-scan image data was provided by a Chinese rail agency. LabelImg is a graphical image annotation tool written in Python which uses Qt for its graphical interface. Since the original length of the raw data is too long, it is not appropriate to input into the model. Images will therefore be cut to a size of 100 pixels × 400 pixels by LabelImg. A sample image is shown in Figure 5.

Figure 5.

Sample image.

After cutting the images, three types of typical internal rail defects are selected. They are rail transverse defect, bolt hole defect, and other defects. The numbers of B-scan images used for training and testing of all classes are included in Table 2.

Evaluation metrics and definitions

The following is a list of abbreviations for model evaluation metrics and definitions used in this research. •

TP: True Positive

•

TN: True Negative

•

FP: False Positive

•

FN: False Negative

•

Acc: Accuracy is the ratio of true classifications among all classifications. $A c c = \frac{T P + T N}{T P + T N + F P + F N}$ .

•

P: Precision is the ratio of true positive classifications among true positive and false positive classifications. $P = \frac{T P}{T P + F P}$ . In the multi-classification, Weighted-P is mainstream metric for evaluating precision. $W e i g h t e d - P = \sum_{1}^{n} w^{i} \times P_{i}$ , where $w^{i} = \frac{N u m b e r o f s a m p l e s i n c l a s s i}{T o t a l n u m b e r o f s a m p l e s}$ .

•

R: Recall is the ratio of true positive classifications among true positive classifications and false negative classifications. $R = \frac{T P}{T P + F N}$ . In the multi-classification, Weighted-R is the mainstream metric for evaluating recall. $W e i g h t e d - R = \sum_{1}^{n} w^{i} \times R_{i}$ , where $w^{i} = \frac{N u m b e r o f s a m p l e s i n c l a s s i}{T o t a l n u m b e r o f s a m p l e s}$ .

•

F1: F1-score is the harmonic mean of precision and recall. $F 1 = \frac{2 * P * R}{P + R}$ .

•

G-Mean: The Geometric Mean is square root of precision and recall. $G - M e a n = \sqrt{P * R}$ .

Training hyperparameters configuration of proposed customized efficientNet-b7

In intensive fine-tuning experiments, the optimal combination of hyperparameters was obtained via 200 training images and 80 test images. The loss function we utilized is cross-entropy. The network hyperparameters include learning rate, weight decay, epoch, momentum, and batch size, and their values are 0.0001, 0.00,001, 100, 0.9, and 32, respectively.

Traditional approach example

Texture traits are crucial for recognizing objects.³⁸ Researchers have conducted various trials combining texture traits of B-scan images and traditional machine learning classifiers to classify internal rail defects.^39,40 Based on the pattern of human visual perception, researchers designated six textural features known as Tamura texture features: coarseness, contrast, directionality, line-likeness, regularity, and roughness.³⁸ However, the texture features of images in the industry are not smooth enough to process. LBP is proposed to smooth the images. Huang²³ proposed an internal rail defect classification model using B-scan image data based on an image processing algorithm, which utilizes Tamura texture features and an LBP combined algorithm to extract the feature vectors of the rail defect B-scan images. Then, these feature vectors are trained with a solid SVM for classifying rail defects. The framework of this method is depicted in Figure 6. The results from Huang²³ show that such an SVM classifier can classify five classes of internal rail defects in B-scan images with high accuracy, and the algorithm is different from image matching and simple rules-based methods. It can continuously learn and train according to the different training sample sets provided, to obtain the optimal classifier suitable for the current training sample set. This mechanism of their method differs from the CNN-based method and achieves high accuracy in their dataset.

Figure 6.

The framework of Huang’s method.²³

CNN-based methods

Sun²⁵ proposed a novel multi-stage model based on AlexNet and the DBSCAN method to classify rail B-scan images. AlexNet is a deep convolutional neural network architecture and consists of eight layers, including five convolutional layers and three fully connected layers.²⁵

Chen et al.²⁷ proposed an improved YOLO V3 target detector algorithm to classify internal rail defects using B-scan images. DarkNet-53 is the backbone network for YOLOv3. DarkNet-53 consists of 53 convolutional layers, including shortcut connections and residual blocks.

With deeper convolutional layers, traditional CNN doesn’t achieve high accuracy. The deeper convolutional layers make it hard for the model to converge optimally and introduce the vanishing gradient problem. Hu et al.²⁸ modified ResNet-50, one variant of ResNet, for classifying internal rail defects in B-scan images. ResNet-50 as trained on the dataset from Hu outperforms some traditional machine learning methods.²³ The general ResNet-50 model is comprised of 48 convolutional layers, one max pooling layer, one average pooling layer, and a fully connected output layer. The structure of the ResNet-50 model is depicted in Figure 7.

Figure 7.

ResNet-50 network architecture.²⁸

In this research, we selected the CNN-based model pre-trained on the ImageNet dataset and fine-tuned it on our dataset with data augmentation. With the help of transfer learning, we froze convolutional layers, max-pooling layers, and average pooling layers and customized the dense layer and the final output layer for the fine-tuning of our dataset. Extensive experiments were conducted on the fine-tuning network and initial network, and the results are analyzed in the following sections.

Comparison

Model evaluation of traditional method and CNN-based methods

The experiment results of SVM with Tamura texture and LBP features from Huang, AlexNet model, DarkNet-53 model, ResNet-50 model, and EfficientNet-b7 are depicted in Table 3. The results show that EfficientNet-b7 with transfer learning techniques achieves the highest accuracy. The accuracies of CNN-based methods with transfer learning techniques are far greater than those of SVM with Tamura texture and LBP features.

Table 3.

Result comparison on testing dataset.

Method	Accuracy (%)	Recall (%)	Precision (%)	F1-score	G-mean
SVM with tamura texture and LBP features (Huang et al., 2020)	60.00	60.00	59.62	0.56	0.59
AlexNet (Sun, 2019)	52.50	54.49	41.13	0.43	0.47
AlexNet with transfer learning (Sun, 2019)	67.50	51.91	68.48	0.55	0.60
DarkNet-53 (Chen et al., 2021)	66.25	71.25	62.24	0.62	0.67
DarkNet-53 with transfer learning (Chen et al., 2021)	75.00	75.00	80.71	0.78	0.78
ResNet-50 (Hu et al., 2021)	67.50	67.50	48.50	0.56	0.57
ResNet-50 with transfer learning (Hu et al., 2021)	76.25	76.25	65.84	0.70	0.71
EfficientNet-b7	78.75	78.75	77.50	0.76	0.78
EfficientNet-b7 with transfer learning (proposed method in this paper)	85.00	85.00	84.71	0.85	0.84

Huang’s method relies on manually designed image features and SVM for classification. The goal of SVM is largely to separate the data points into a much higher-dimensional decision space. The dataset we utilize in our study was sourced from the industry and contains a considerable amount of noise signal. This noise signal poses a challenge for Huang's method to effectively extract the valuable and significant features from our dataset solely relying on the Tamura texture feature and LBP. The insufficient feature set doesn’t allow the following SVM classifier to obtain the optimal maximum interval hyperplane to classify the internal rail defect, which means that plenty of data points lie on the decision boundary of this SVM classifier. More feature extraction methods need to be introduced and properly designed to fit more scenarios.

Model evaluation of other CNN-based models and proposed EfficientNet-b7

The CNN-based methods in this research, except for AlexNet, achieve higher accuracy than the traditional method. However, ResNet-50 achieves 76.25% accuracy on our dataset only by deepening the network with fixed height, width and resolution, which means some crucial B-scan image feature information will be ignored, leading ultimately to wrong classifications. Trying to capture most of the significant image features, our modified EfficientNet-b7 selected a better compound coefficient combination among depth, width, and resolution. EfficientNet-b7 keeps the image information in multi-scale more reasonably and reaches the desired coverage in fewer steps. Compared to other CNN-based models, the capacity of our proposed model demonstrates an improvement, and these characteristics allow our modified EfficientNet-b7 to achieve the best accuracy. Even without the help of transfer learning, our modified EfficientNet-b7 can achieve a higher level of accuracy than other CNN-based models trained with transfer learning.

As shown in Table 3, EfficientNet-b7 with transfer learning achieves the highest accuracy, recall, and precision. In some scenarios of model evaluation, precision and recall can be contradictory. Thus, the F1-score is selected for the overall evaluation of models. Additionally, we compared the G-Mean of these models. EfficientNet-b7 with transfer learning also achieves the highest F1-score and the best G-Mean out of these models. This shows that our proposed method achieves better performance on multi-class classification and classifies the B-scan internal rail defects more accurately.

Conclusions

Traditional machine learning classifiers with manual feature extraction can achieve high accuracy on certain dataset, but struggle with general dataset compared to CNN models. The goal of manual feature extraction is to identify morphologic characteristics of rail defect B-scan data. Optimal feature selection is based on detecting differences between defects. Proper feature selection increases detection rate and speed, but designing a good extractor is time-consuming and may only work for specific data. CNN models extract features directly from data and update independently, eliminating the need for manual extraction. While tremendous and balanced datasets are necessary for training, obtaining such data is challenging in the industry.

In this paper, we collect and standardize the B-scan image dataset of internal rail defects, a specific scenario in the ultrasound field. Because of the rarity of internal rail defects compared with normal data, this dataset is unbalanced and will lead to significant bias during the training process. We conducted data augmentation during the training process and combined it with transfer learning techniques to ameliorate our dataset's lack of data and unbalanced data. Our investigation reveals that, with an unbalanced and low volume of B-scan rail data in the industry, we can develop a high-performance CNN model with the help of data augmentation and transfer learning techniques.

The results show that even with a small dataset, the rail defect classification algorithm based on EfficientNet-b7 with transfer learning and data augmentation techniques differs from simple image pattern matching and can be applied to the field of rail internal defect detection. After some trials, it appears that EfficientNet architecture with data augmentation could lead to the development of a feasible model without requiring too much training data. During the fine-tuning process of transfer learning, our model can obtain a good initial understanding of images and achieve good classification on our small dataset. The proposed EfficientNet-b7 model classifies internal rail defect on B-scan images with an accuracy of 85.00%, precision of 84.71%, recall of 85.00%, F1-score of 0.85, and G-mean of 0.84, which shows its superior performance on our small dataset and model capacity compared to traditional methods and other CNN models.

In the future, we intend to work with site engineers in railroad companies to deploy our proposed methods with ultrasonic flaw detectors and achieve real-time internal rail defect detection. Although our proposed method shows great potential in ultrasonic rail defect detection with less data needed, as compared to mainstream approaches, more real-world ultrasound rail defect data would be beneficial to improve our model’s capacity. Furthermore, our methods can be applied to other image-based industrial evaluations with rare image dataset, such as fault diagnosis in the electric sectors using thermal images.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Guoqing Jing

Xiang Liu

References

Liu

Saat

Barkan

CPL

. Analysis of causes of major train derailment and their effect on accident rates. Transport Res Rec. 2012; 2289(1): 154–163. DOI: 10.3141/2289-20

Federal railroad administration office of safety analysis. 2.09 - train accidents and rates. 2022. https://safetydata.fra.dot.gov/OfficeofSafety/publicsite/query/TrainAccidentsFYCYWithRates.aspx. (accessed 5 July 2022).

Irani

Siers

Menking

, et al.

Economic and fiscal impact analysis of class

I railroads 2017. 2018. https://www.aar.org/wp-content/uploads/2018/11/AAR-Class-I-Railroad-Towson-Economic-Impact-October-2018.pdf (accessed 5 July 2022).

Federal Railway Administration Rail . Interactive rail defect remediation table. 2018. https://frarail.wordpress.com (accessed 5 July 2022).

Zhou

. Artificial intelligence-aided prediction of broken rail-caused derailment risk. USA: Doctoral dissertation, Rutgers University-School of Graduate Studies, 2020.

European Union Agency for Railways . Railway safety in the European union – Safety overview 2017. 2018. https://data.europa.eu/doi/10.2821/474487 (accessed 5 July 2022).

European Commission . AutoScan – rail inspection by autonomous systems. 2022. https://cordis.europa.eu/project/id/720506 (accessed 5 July 2022).

International Energy Agency . The future of rail: opportunities for energy and the environment. 2019. https://iea.blob.core.windows.net/assets/fb7dc9e4-d5ff-4a22-ac07-ef3ca73ac680/The_Future_of_Rail.pdf (accessed 5 July 2022).

European Commission . White paper, Roadmap to a Single European Transport Area – towards a competitive and resource efficient transport system. 2011. https://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=COM:2011:0144:FIN:en:PDF (accessed 24 May 2023).

10.

Association of American Railroads . Freight rail’s investments generate huge returns for America. 2023. https://www.aar.org/issue/freight-rail-investments (accessed 24 May 2023).

11.

Network Rail . Ultrasonic rail inspection. 2019. https://www.networkrail.co.uk/wp-content/uploads/2019/06/Ultrasonic-Rail-Inspection-v1.pdf (accessed 5 July 2022).

12.

Cha

Choi

Büyüköztürk

. Deep learning-based crack damage detection using convolutional neural networks. Comput Aided Civ Infrastruct Eng 2017; 32(5): 361–378.

13.

Cha

Choi

Suh

, et al. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput Aided Civ Infrastruct Eng 2018; 33(9): 731–747.

14.

Alawad

Kaewunruen

. A deep learning approach towards railway safety risk assessment. IEEE Access 2020; 8: 102811–102832.

15.

Jing

Qin

Wang

, et al. Developments, challenges, and perspectives of railway inspection robots. Autom ConStruct 2022; 138: 104242.

16.

Zaman

Huang

, et al. Artificial intelligence-aided grade crossing safety violation detection methodology and a case study in New Jersey. Transportation Research Record, 2023.

17.

Ali

Cha

. Attention-based generative adversarial network with internal damage segmentation using thermography. Autom ConStruct 2022; 141: 104412.

18.

Ali

Cha

. Subsurface damage detection of a steel bridge using deep learning and uncooled micro-bolometer. Construct Build Mater 2019; 226: 376–387.

19.

Kang

Benipal

Gopal

, et al. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom ConStruct 2020; 118: 103291.

20.

Kang

Cha

. Efficient attention-based deep encoder and decoder for automatic crack segmentation. Struct Health Monit 2022; 21(5): 2190–2205.

21.

Choi

Cha

. SDDNet: Real-time crack segmentation. IEEE Trans Ind Electron 2019; 67(9): 8016–8025.

22.

, et al. Train rail defect classification detection and its parameters learning method. Measurement 2020; 151: 107246.

23.

Huang

Luo

Wang

, et al. Research on classification of rail defects based on image processing algorithm. Electr Drive Locomotives 2020; 4: 41–46. DOI: 10.13890/j.issn.1000-128x.2020.04.100

24.

Zhao

Wang

, et al. Rail Defect Detection Method Based on BP Neural Network. The 10th International Conference on Computer Engineering and Networks. CENet 2020, 2021. DOI: 10.1007/978-981-15-8462-6_8

25.

Sun

. Research on detection and judgment of rail damage based on intelligent analysis of time and space detection data. China: Doctoral dissertation, Beijing Jiaotong University. 2019.

26.

Wang

. Intelligent Recognition Method for Rail Damage Defects of Heavy-load Railway Based on B-scan Image. Master’s thesis. China: Beijing Jiaotong University. 2020.

27.

Chen

Wang

Yang

, et al. Deep learning for the detection and recognition of rail defects in ultrasound B-Scan images. Transport Res Rec. 2021; 2675(11): 888–901. doi:10.1177/03611981211021547

28.

Qiu

, et al. Ultrasonic detection and classification for internal defect of rail based on deep learning. J China Railw Soc 2021; 4: 108–116.

29.

Xiong

Jing

Wang

, et al. Detection of rail defects using NDT methods. Sensors. 2023; 23: 4627. DOI: 10.3390/s23104627

30.

Mikołajczyk

Grochowski

Data augmentation for improving deep learning in image classification problem. 2018 international interdisciplinary PhD workshop (IIPhDW). IEEE, 2018; 117–122.

31.

Yosinski

Clune

Bengio

, et al.

How transferable are features in deep neural networks?

Adv Neural Inf Process Syst 2014; 27.

32.

Deng

Dong

Socher

, et al. A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255, DOI: 10.1109/CVPR.2009.5206848

33.

Wang

Liu

Xie

, et al. Boosted efficient net: detection of lymph node metastases in breast cancer using convolutional neural networks. Cancers. 2021; 13(4): 661. DOI: 10.3390/cancers13040661

34.

Zhang

Wang

Saniie

Deep convolutional neural networks applied to ultrasonic images for material texture recognition. 2020 IEEE international ultrasonics symposium (IUS). IEEE, 2020, 1–3.

35.

Zhang

Ren

, et al. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition 2016: 770–778.

36.

Tan

LeEfficientnet

. Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, 2019, pp. 6105–6114.

37.

Gupta

TanEfficientNet-EdgeTPU

. Creating Accelerator-Optimized Neural Networks with AutoML. Google AI Blog. 2019. https://ai.googleblog.com/2019/08/efficientnet-edgetpu-creating.html (accessed 5 July 2022).

38.

Tamura

Mori

Yamawaki

. Textural features corresponding to visual perception. IEEE Transactions on Systems, man, and cybernetics 1978; 8(6): 460–473.

39.

Sulimova

Zhukov

Krasotkina

, et al. Automatic rail flaw localization and recognition by featureless ultrasound signal analysis. In: International conference on machine learning and data mining in pattern recognition. Cham: Springer, 2018, pp. 16–27.

40.

Andrey

Vitaliy

. Machine learning in problems of automation of ultrasound diagnostics of railway tracks, 2020. arXiv preprint arXiv:2006.16844.

Artificial intelligence-aided detection of rail defects based on ultrasonic imaging data

Abstract

Keywords

Introduction

Literature review

Contributions

Material and methods

Material

The B-scan images of common internal rail defects and their characteristics

Rail transverse defect

Bolt hole defect

Other internal rail defects

Methods

EfficientNet

Transfer learning and data augmentation

Experiments and discussion

Dataset

Evaluation metrics and definitions

Training hyperparameters configuration of proposed customized efficientNet-b7

Traditional approach example

CNN-based methods

Comparison

Model evaluation of traditional method and CNN-based methods

Model evaluation of other CNN-based models and proposed EfficientNet-b7

Conclusions

Footnotes

Declaration of conflicting interests

Funding

ORCID iDs

References