Abstract
Recently, in the field of structural health monitoring, the detection of bolted connection looseness through percussion-based method and machine learning technology has received much attention due to the advantages of removing the requirement of sensor installation and potential for automation. However, there are few such research which are performed in the underwater environment. The paper proposes a new method, Feature-reduced Multiple Random Convolution Kernel Transform (FM-ROCKET), to identify the looseness level of the underwater bolted connections based on the percussion-induced sound (audio signal). By integrating deep learning (DL) and shallow learning, the FM-ROCKET model uses the 1D convolutional layer (a DL method) to extract features from the percussion-induced audio signal and adopts the rigid classifier (linear classifier, a shallow learning method) to classify the features. Five different preload levels of the bolted flange are considered. A hammer is utilized to tap the flange surface and the continuous percussion-induced audio signal is collected by a smartphone in an underwater environment. After the audio signal segmentation, single-hit audio signals are fed into the FM-ROCKET model. To verify the effectiveness of the proposed method, three case studies are conducted on two flanges. In case study I, the proposed method slightly outperforms other DL-based methods under different training/test splitting ratios. In case studies II and III, the proposed method is far more effective than other DL-based methods on independent and different test sets. The results demonstrate the superiority of the FM-ROCKET model in the underwater detection of bolted flange looseness. To the best of our knowledge, this article is the first attempt to address the detection of bolted flange looseness in the underwater environment by combining percussion-based method, DL, and shallow learning.
Keywords
Introduction
From the early industrial age to modern times, among many structural connection types, bolted connections have been a reliable type of connection for structural components and widely used in many industries. 1 For example, the offshore oil industry employs pipelines with flange connections to transport the oil from the seabed to the land. 2 Underwater pipelines, which stretch for miles and miles, are often jointed by bolted flanges. However, such connection still suffers from problems, such as bolt looseness, which may be resulting from chemical erosion, mechanical vibration, impact from foreign objects, and improper installation.3,4 That is, in the pipeline system, bolted connection represents a point of vulnerability and is prone to self-loosening due to uncertainties, which might lead to disastrous consequences, resulting in economic losses. Therefore, it is important to detect the bolt looseness of the subsea flange.
In the past decades, benefitting from the rapid development in structural health monitoring (SHM),5–7 researchers have been making contributions to the detection of bolted connection looseness in the air.8–10 Notably, detection approaches based on the piezoelectric material 11 have been developed. By analyzing the energy dissipation caused by tangential damping, Wang et al. 12 presented an active sensing method for quantitative monitoring of the bolt looseness. Later, Wang et al. 13 developed a novel electromechanical impedance model for monitoring the bolt looseness, which illustrates the relationship between the mechanical impedance of the bolted joint and the electrical impedance of a piezoceramic patch mounted on the joint. Then, by combining spectral sidebands and high-order harmonics, Zhang et al. 14 introduced a contact acoustic nonlinearity-based monitoring system for detecting bolt looseness. Meyer and Adams 15 used the impacted-acoustic modulation method to detect the bolt looseness, where impact modulation results are quantified using an integration-based metric and this metric increases as the preload force on bolts decreases. Moreover, Li and Jing 16 proposed a novel second-order output spectrum-based method to detect multi-bolt loosening faults in complex structures with a sensor chain. By introducing the machine learning techniques, Wang and Song 17 further developed a novel vibro-acoustic method for bolt looseness detection, which outperforms the traditional vibro-acoustic method. Although above detection methods have demonstrated their effectiveness, they were all implemented in the air and might be limited under the water because of the requirement of constant contact between structures and transducers. In addition, sensor installation requires additional human labor and financial costs in some complex situations.
Taking advantage of the computer vision technology, researchers developed non-contact detection methods for the bolted connection looseness. Cha et al. 18 applied the image processing and support vector machine (SVM) to bolt looseness detection, where feature classification is based on the horizontal and vertical lengths of the bolt heads. However, this method has prerequisites that bolts need to be located in the middle of the image and, for each test, the bolt connections should have the same layouts. Ramana et al. 19 improved the method in Cha et al. 18 by using the Viola–Jones algorithm to automatically localize the bolt in the image. Recently, by discriminating the rotation angle of the nuts, deep learning-based (DL-based) methods20–23 were proposed for identifying loose bolts. Nonetheless, vision-based articles all ignore a problem that, in the early stage of the bolted connection looseness, it may not cause visible changes in the rotation angle and position of the nut. Moreover, considering the refraction, reflection, and shadows in the underwater environment, the camera view may become blurred and fail to acquire accurate images.
To eliminate the above drawbacks into consideration, in recent years, as an ancient but effective method, the non-destructive percussion-based method24,25 has entered our sight again, achieving promising performance in the detection of bolted connection looseness. Kong et al. 26 proposed a new percussion-based approach to determine the preload level of bolted joints by decision tree. Zhang et al. 27 combined mel-frequency cepstral coefficients (MFCC) feature matrix and principal component analysis (PCA) principal components of the percussion-induced audio signal, then they used these feature representations to train and test a SVM model. Next, due to the rapid development of artificial intelligence, more DL technologies are applied to identify the percussion-induced audio signals. Yuan et al. 28 employed multiscale-entropy analysis to extracting underlying characteristics and fed them into a back propagation neural network for training and testing. Moreover, Wang and Song 29 developed the 1D training-interference CapsNet, which combines feature extraction and classification. Similarly, the percussion-based method was also employed to detect the damage in other structures, such as cup-lock scaffolds, 30 aluminum spatial structure, 3 and timber columns. 31
Although many articles have been reported on the detection of bolted connection looseness by percussion-based method, most of them are conducted in the air. A few articles32–34 focused on the underwater bolt looseness detection; however, they did not use the percussion-based method and more attention is paid on the design of the detection device, not on the detection technology. Jiang et al. 35 conducted a feasibility study of subsea bolt looseness detection by using piezoceramic transducers enabled active sensing; however, they just analyzed the lead zirconate titanate (PZT) signals under different preload levels and revealed the difference between these signals. Subsequently, based on the active PZT sensing and entropy theory, Wang et al. 36 proposed a stacking-based ensemble learning method to detect the bolted connection looseness under the water, while only two preload levels (tightened and loose) were considered and the sensor installation was required in underwater experiments.
Overall, the above literatures show many novel methods, such as signal processing technologies, machine learning technologies, and DL technologies, facilitating the development of the detection of bolted connection looseness. However, in the field of the detection of bolted connection looseness, three issues have not received the desired attention in current research. First of all, most of the detection studies using percussion-induced sound achieved satisfactory results in the air while their models might not be able to obtain desired results under the water. The second concern is that current underwater detection using sensors only consider two preload levels (tightened and loosened). Third, most of percussion-based detection studies train and test their models on a dataset by a certain training/test splitting ratio, while the performance of their models on independent test sets have not been fully verified. In other words, the robustness of the models to environmental and operational variants and the robustness of the models to different detection objects with similar structure have not been fully verified.
Regarding the above three problems, we proposed a new strategy to detect bolted flange looseness in the underwater environment using percussion-based method and Feature-reduced Multiple Random Convolution Kernel Transform (FM-ROCKET). The FM-ROCKET model is developed based on the Multi-ROCKET model 37 which is a newly emerged convolution-based model. The FM-ROCKET model uses a 1D convolutional layer to extract features from percussion-induced audio signals and employs a linear classifier to classify the features, achieving better performance than other state-of-the-art, DL-based methods in SHM. In summary, we are confident to declare main contributions of this articles as follows:
This article, for the first time, studies the underwater detection of bolted flange looseness through integrated percussion and DL method.
Based on features computed by the Multi-ROCKET model, we modify two kinds of old features with different scale factors. Experimental results from three case studies (I, II, III) show the effectiveness of the modified feature representation.
Compared to other advanced, DL-based methods in SHM and other fields, the proposed FM-ROCKET model achieves better performance on independent datasets collected under different scenarios (assemble, operator, temperature, time, object), which shows the robustness of the proposed method to environmental and operational variants (case study II) and the robustness of the proposed method to different detection objects with similar structure (case study III).
Compared to Multi-ROCKET model, the proposed FM-ROCKET model achieves similar classification performance in case study I and III. However, the proposed method obtains better performance in case study II where the training set and corresponding test set are collected from the same flange and independent of each other, which shows better robustness to environmental and operational variants (case study II).
The rest of this paper is organized as follows: Section ‘Feature extraction: MFCC’ introduces the MFCC which we use to compare with the proposed method. Section ‘The proposed method: FM-ROCKET’ elaborates relative theoretical background and the proposed method. Section ‘Experimental setup’ describes the experimental setup. Section ‘Results and discussion’ presents the experimental results and corresponding discussion, and Section ‘Conclusion’ concludes the paper.
Feature extraction: MFCC
For the audio signal processing, there are many technologies that can be used to extract representative features from the audio signal, such as bark band energy features, 38 power spectral density,26,39 and so on. Among them, MFCC30,40,41 based on the human perceptual frequency range, is an effective and commonly-used feature representation of audio signals. In this paper, MFCC is mainly adopted to compare with the proposed method in the underwater detection of bolted flange looseness, therefore we give an introduction of MFCC. Figure 1 illustrates the main steps of MFCC processing and corresponding formulas are shown as follows:
(1) Pre-emphasis

The flow chart of mel-frequency cepstral coefficients (MFCC) processing.
The pre-emphasis is used to balance the spectrum of the audio signal that has a steep roll-off in the high-frequency region by a high-pass filter.
where
(2) Framing and windowing
Next, according to the specific frame length and step, the signal is split into frames,
where
where
(3) Fourier transform
In the third step, each frame is transferred into magnitude spectrum by discrete Fourier transform (DFT),
where
(4) Mel-filtering
In this step, we predefine a set of Mel-filter bank,
where
where
(5) Discrete cosine transform
Finally, the MFCC matrix (R × T) can be obtained by applying the discrete cosine transform to Mel-coefficient matrix above.
where T often takes 12.
The proposed method: FM-ROCKET
Random Convolution Kernel Transform
Derived from the typical 1D convolutional kernel in the convolutional neural network, ROCKET 42 introduces a very large number of 1D convolutional kernels, which have random and different length, bias, dilation, weights, and paddings, to capture feature maps for the input time series. Particularly, the length of each kernel is selected randomly from three values (7, 9, 11) given the same probability. In addition, the values of weights are sampled from a normal distribution and the values of biases are sampled from a uniform distribution. Dilation scale is sampled on the following exponential scale,
where
After the convolution, ROCKET computes two features from each feature map, which means that it produces two real-values per kernel. One is the Maximum Value (MV) of the current feature map. The other one is called Proportion of Positive Values (PPV) that captures the proportion of positive values of the input time series. Notably, in literatures,42,43 PPV has been proven to be a significant feature that develops meaningfully higher accuracy than other features, like mean value of the input, in classification problems.
Furthermore, literatures37,42,43 demonstrate that, under the features produced by the ROCKET, a linear classifier can develop higher classification accuracy than other classifiers, even for datasets where the number of features dwarfs both the number and length of samples (1D sequence).
Multi-ROCKET
Unlike the ROCKET, the Multi-ROCKET uses the fixed length (9) for all kernels and the values of weights are selected from two kinds of values (−1, 2). In addition, to balance the classification accuracy with the computational advantages from a small set of kernels, the Multi-ROCKET adopts a fixed group of 84 kernels. In literature, 37 this group has been justified that it produces high classification accuracy and is kept as a default parameter of Multi-ROCKET.
In the fixed group (84 kernels), each kernel uses a fixed set of dilations sampled on Equation (13). In terms of the bias, for each kernel/dilation combination, researchers randomly select a sample from the training set, calculate the corresponding convolution output, and take the quantiles of the convolution output as the bias values. Besides, zero padding is implemented between kernel/dilation combinations alternately. Consequently, the randomness of the Multi-ROCKET comes from the bias and other parameters are fixed.
In addition, the Multi-ROCKET removes the MV and adds three new features to increase the diversity and discriminatory, which are Mean of Positive Values (MPV), Mean of Indices of Positive Values (MIPV), and Longest Stretch of Positive Values (LSPV). Therefore, a feature map is represented by four different features, namely, PPV, MPV, MIPV, and LSPV. An example of calculating four features is shown in Figure 2. The detailed calculation formulas of four features are presented as follows:

Single feature map.
First, the output of the 1D convolutional operation is computed below,
where
(1) Proportion of Positive Values
The PPV is defined as,
where
(2) Mean of Positive Values
The MPV is defined as
where
(3) Mean of Indices of Positive Values
The MIPV is defined as,
where
(4) Longest Stretch of Positive Values
The LSPV is defined as,
where
Notably, the Multi-ROCKET not only extracts four different features from the input time series, but also extracts that from its first order difference, which increases the diversity of the features. As a result, both the input time series and its first order difference are convolved with the fixed group of 84 kernels. Finally, these feature representations are used to train or test a linear classifier.
The Proposed FM-ROCKET
Based on above four features, we modified these features and proposed a FM-ROCKET. Compared with the values of PPV and MPV, the values of MIPV and LSPV are considerably large. This difference may bring more uncertainties into the model. Therefore, on the one hand, we introduce an alterable scale factor
where
In addition, the features PPV and MPV are retained. The final feature representation vector of the single-hit audio signal is shown as follows,
where
where

An example of Longest Stretch of Positive Values (LSPV), Scaled Longest Stretch of Positive Values (SLSPV), Mean of Indices of Positive Values (MIPV), and Scaled Mean of Indices of Positive Values (SMIPV) values.
The overall architecture of the underwater bolted flange looseness detection is illustrated in Figure 4. The percussion-induced sound is collected by a smartphone with a waterproof jacket and, after the audio signal segmentation, the single-hit audio signal and its first order difference are fed into the FM-ROCKET model. First, the single-hit audio signal and its first order difference convolve with

Flowchart of the proposed method (FM-ROCKET).
Experimental setup
To demonstrate the effectiveness of the proposed FM-ROCKET model, we implemented the proposed methods along with other methods on two stainless flanges (A and B). As depicted in Figure 5(b), the two flanges have the same dimension and each one employs four pairs of bolts and nuts. By tightening bolts using a torque wrench, we select five different preload levels (five classes) for each flange (Table 1). Under each preload level, a steel hammer is used to tap the surfaces of the flanges and a smart phone with microphone (48 kHz sampling frequency, 16 bits resolution) is adopted to collect percussion-induced audio signals (Figure 5(a)). At the data collection stage, under each pair of flanges, we tap around 120 times for each class. Particularly, each tapping is performed under a random force at a random point in the area encircled by red lines in Figure 5(b). As a result, the dataset includes around 600 single-hit audios. In the experimental stage, the percussion can be performed manually. However, in some real scenarios which are not accessible to persons, we can adopt unmanned underwater vehicle and underwater robot to execute the percussion task. It is worth nothing that, in order to consider the environmental and operational variants, eight independent datasets (Table 2) are captured in different scenarios (assemble, operator, temperature, time). Specifically, eight datasets are collected by two researchers in 1 week and operators first loosen the bolts before tightening them under each preload level in each dataset, which ensures that classes are independent of each other and datasets are independent of each other. Datasets (1, 2, 3, 4) are taken from the flange A and datasets (5, 6, 7, 8) are taken from the flange B. Figure 6 exhibits single-hit audio signals under different datasets and preload levels. We can see that the received audio signals have no obvious relationship with preload levels. Moreover, to intuitively show the influence of the underwater environment on the detection, Figure 7 displays the percussion audio signals collected under the water and in the air. It became apparent that the underwater environment reduces the audio quality and brings much noise. In addition, the FM-ROCKET model is implemented in python (3.6) with main library tensorflow-gpu (2.6.2). The computer is equipped with Intel i7-11800H CPU, 32GB RAM memory, and NVIDIA GeForce RTX 3070 GPU.

Experimental setup: (a) apparatus and (b) two flanges.
Arrangement of five classes.
Number of samples of eight independent datasets.

Percussion-induced audio signals under different preload levels and datasets.

Percussion-induced audio signals collected under the water (left) and in the air (right).
Results and discussion
In real-world scenarios, it is difficult to detect the bolt connection looseness under the same situation since the environment includes the randomness and uncertainty. Therefore, it is necessary to test the robustness of the model to the environmental and operational variants. Additionally, it is impractical to build a dataset that covers enough data from countless underwater bolted connections so that the serviceability or applicability of the trained model is another concern. These issues contribute to three case studies. In case study I, we combine all eight datasets into a large dataset and split this large dataset into training and test sets. In case study II, one of the eight datasets is selected as the test set and the remaining seven datasets are integrated into a training set. In case study III, datasets from the flange A(B) are taken as the training set and that from the flange B(A) as the test set.
Case study I: Performance under different training/test splitting ratios
In this case, we combine all the datasets into a single large dataset (4889) and split this dataset into training and test sets by the different training/test ratios (8:2, 7:3, 6:4, 5:5). To keep the balance of both the training and test sets, each class occupies the same percentage in the training and test sets. In addition, samples from each dataset account for the same proportion in both training and test sets. Subsequently, to illustrate the effectiveness of the proposed method, we compare the FM-ROCKET model with some DL-based models in SHM literatures and Table 3 shows all the classification accuracy which is the average value under four repeated experiments. Normally, since the machine learning or DL methods randomly initialize their trainable parameters, in each repeated run, the model may have different trained parameters when the training process is over. Hence, it is necessary to run the model several times and compute the average accuracy. It can be found that while the proposed method performs slightly better than other DL-based methods, the computational cost is relatively low. The computational time for both FM-ROCKET model and Multi-ROCKET model is around 50 s, which is far less than that of other methods. Overall, in the detection of underwater bolted flange looseness, the classification performances of all the methods are similar and there is no significant difference under four different training/test splitting ratios.
Comparison of test set accuracy (%) among various methods and training/test splitting ratios.
CNN: convolutional neural network; MFCC: mel-frequency cepstral coefficients; ROCKET: Random Convolution Kernel Transform.
LSTM: long short-term memory; The proposed method: Feature-reduced Multiple Random Convolution Kernel Transform.
Bold values are used to highlight the classification accuracy of the proposed method.
Case study II: Robustness to environmental and operational variants
Since eight independent datasets are collected from different scenarios (assemble, operator, temperature, time), in this case, eight types of experiments are designed. In the
Comparison of test set accuracy (%) among various methods.
CNN: convolutional neural network; MFCC: mel-frequency cepstral coefficients; ROCKET: Random Convolution Kernel Transform.
LSTM: long short-term memory; The proposed method: Feature-reduced Multiple Random Convolution Kernel Transform.
Bold values are used to highlight the classification accuracy of the proposed method.

Confusion matrices of some typical classification results.
Obviously, for the performance on independent and different test sets, the proposed method outperforms all the other methods, demonstrating its advantages in the detection of underwater bolted flange looseness. However, for the test sets 2 and 7, the classification accuracy is relatively poor. For dataset 2, our model cannot effectively classify the audio signals (60 ft-lbs) into their actual class and, for dataset 7, our model cannot effectively classify the audio signals (80 ft-lbs) into their actual class. A possible explanation for these outcomes is that, as mentioned above, these datasets are independently collected under different scenarios and contain the randomness and uncertainty, therefore the percussion-induced audio signals may encounter with “saturation” or “similarity” problem under high preload levels in some datasets. In Figure 9, the audio signal energy reflects this possible problem. It can be found that the signal energy distribution under high preload levels (60, 80 ft-lbs) is more compact than that under low preload levels (0, 20, 40 ft-lbs). In addition, the signal energy under high preload levels (60, 80 ft-lbs) is similar. Furthermore, it is noted that this “saturation” or “similarity” phenomenon also appears when K-means clustering algorithm is applied to the datasets 2 and 7 in the Figure 10. The K-means algorithm classifies most of the audio signals under 60 and 80 ft-lbs into the same group (red dashed box in Figure 10), which reveals the similarity between the audio signals under 60 and 80 ft-lbs.

Energy of percussion-induced audio signals of (a) dataset 2 and (b) dataset 7.

Clustering results: (a) true label of dataset 2, (b) clustered label of dataset 2, (c) true label of dataset 7, and (d) clustered label of dataset 7.
Case study III: Robustness to different detection objects with similar structure
In this case, we employ datasets from different flanges to respectively serve as training and test sets to study the method’s robustness and its potential for practical application. As mentioned above, datasets (1, 2, 3, 4) are captured from the flange A and datasets (5, 6, 7, 8) are from the flange B. The trial settings are shown in Table 5, the performances of different methods are illustrated in Figure 11, and the confusion matrices are given in Figure 12. Same as the above two case studies, the classification accuracy is the average value of four repeated experiments. As expected, the performance of all methods decreased due to the fact that an independent dataset is used for the testing. It is important to point out that the proposed method and the Multi-ROCKET model have the similar and best performance, far outperforming other methods. Although all the methods do not perform well in this case, it still demonstrates that, in the underwater environment, the FM-ROCKET model and Multi-ROCKET model surpass other DL-based methods in terms of the applicability to another similar bolted connection.
Settings for different trials.

Comparison of test set accuracy among significant works.

Confusion matrices of Trial 1 and Trial 2: (a) FM-ROCKET (test set: B) and (b) FM-ROCKET (test set: A).
Conclusion
To monitor the bolted flange looseness in the underwater environment, we propose a novel detection method using percussion-induced audio signal, DL, and shallow learning. Specifically, to process the percussion-induced audio signal, we develop the FM-ROCKET, which achieves promising classification accuracy. Compared to current DL-based methods, the proposed FM-ROCKET model uses a 1D convolutional layer (a DL method) and a rigid classifier (linear classifier, a shallow learning method), and the 1D convolutional layer is used to extract features from the input audio signal (DL part) and the linear classifier is adopted to identify the feature representation of the signal (Machine learning (ML) part). Notably, many state-of-the-art, DL-based models in SHM lack the robustness to environmental and operational variants and the robustness to different detection objects with similar structure in the underwater environment. However, with the verification under three case studies, the proposed FM-ROCKET model demonstrates approximate or better performance in these two aspects than several significant DL-based models in SHM.
In case study I, the training set and corresponding test set are dependent of each other because they are from the same dataset under the same flange. In case study II, the training set and corresponding test set are independent of each other since they are from different datasets under the same flange. In case study III, the training set and corresponding test set are more independent of each other since they are from different datasets under different flanges. Case study I proves that all the methods have similar classification performance on training and test sets which are dependent of each other. Next, in case study II, the proposed method and Multi-ROCKET model surpass other methods. In addition, with the help of the reduced features, the proposed method achieves better performance than Multi-ROCKET model in five independent test sets (D1, D2, D4, D6, D8) and similar performance to Multi-ROCKET model in the remaining three independent test sets (D3, D5, D7). Therefore, in case study II, the overall classification performance of the proposed method is better than that of the Multi-ROCKET model and far outperforms that of the other methods, which shows the robustness to environmental and operational variants. Last, case study III indicates that, on training and test sets which are more independent of each other, the proposed method obtains a similar classification performance to the Multi-ROCKET model, and outperform other methods, which shows the robustness to different detection objects with similar structure in the underwater environment. In terms of case studies I and III, the proposed method has similar performance to the Multi-ROCKET model while in case study II the proposed method excels the Multi-ROCKET model with reduced features.
In the future work, several issues will be investigated: (1) the real-world noises will be taken into consideration; (2) the proposed method will be extended to the cases with higher preload levels; (3) the proposed method will be further improved to determine the detailed location of the loosened bolts on the flange; (4) the robustness of the classification model to different detection objects with similar structure will be further enhanced. Therefore, we will further improve our method to address the underlying problems. Without installing the constant-contact sensors, this easy-to-implement and low-cost detection method has great potential in future applications.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by Texas Commission on Environmental Quality through Subsea Systems Institute Award #582-15-57593. This project was paid for [in part] with federal funding from the Department of the Treasury through the State of Texas under the Resources and Ecosystems Sustainability, Tourist Opportunities, and Revived Economies of the Gulf Coast States Act of 2012 (RESTORE Act). The content, statements, findings, opinions, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of the State of Texas or the Treasury.
