Abstract
With the continuous development of economy and society, factors such as the variety of underwater targets and the high level of environmental noise have a great impact on the classification accuracy of underwater target radiation noise, and the traditional classification method based on signal features can no longer meet the requirements of underwater target identification. In this paper, we propose an underwater target radiation noise classification method based on enhanced image and convolutional neural network. First, the underwater target radiation noise signal is converted into enhanced image by various methods, then the converted image data set is used as the input of convolutional neural network for model training, and finally the great advantage of convolutional neural network in image classification is used to accurately classify underwater target radiation noise. In order to propose an optimal augmented image transformation method, this paper uses several augmented image transformation methods and compares the classification results. The experimental results show that the augmented image and convolutional neural network methods based on lagomorphs and corner fields have the highest classification accuracy and the best classification efficiency.
Introduction
Underwater target variety, complex working conditions and high environmental noise level, coupled with the underwater target radiation noise itself has a non-smooth, non-linear and non-Gaussian three non-characteristics, a variety of factors coupled and superimposed on the growing development of underwater target recognition has caused a huge impact. Traditional underwater target recognition is carried out using features such as power spectrum estimation 1 and Low Frequency Analysis Recording (LOFAR) 2 of underwater target radiated noise signals, and then using classification models such as Support Vector Machines,3–6 Decision Trees, etc. to complete the classification and recognition, which has a very serious impact on with the noise, and even leads to the failure of the recognition process, and has been seriously lagging behind the increasing data demand. In view of the above problems, there is an urgent need for a method that has little effect on the disturbance of noise and has higher classification accuracy and efficiency. Based on the method of enhanced image and convolutional neural network, the original underwater target radiation noise signal is transformed into an image by the enhanced image method without going through the noise processing link, which can effectively reduce the loss of effective information.
Scholars have done a lot of research in the traditional underwater target radiated noise signal feature extraction. Xiao et al.7–11 proposed a multi-scale spectral feature set for hydroacoustic target recognition problem, which has a clear physical meaning and is more suitable for the traditional machine learning method for classification. The learning method for classification, and the accuracy of recognising static and dynamic targets reached 98% and 90%, respectively. This method is more robust in high noise and changing environments. Wang et al.12,13 proposed that elastic information is the only way to recognise underwater targets with the same geometry but different materials, and investigated three feature extraction methods based on wavelet technology to obtain the elastic information of the underwater target echoes. Lei et al. 14 proposed a method based on compressed sensing multiscale samples. Compressed perception multiscale sample entropy underwater target radiation noise feature extraction method, the underwater target radiation noise signal compressed perception processing, the use of variational mode decomposition to extract the centre frequency of the maximum capacity eigenmode function as a feature for underwater target identification features, experimental results show that the proposed method can be quickly and effectively classified and identified, reducing the dependence on the operator and a priori knowledge. Li et al.15–29 proposed a multi-scale inverse dispersion entropy for underwater target radiated noise feature extraction, which can describe the complexity of signals of different scales, and the experimental results show that the proposed method has better separability and higher recognition rate. Zhang et al. 30 proposed a receiver-by-receiver based imaging algorithm, and the proposed method successfully avoids ghosting targets with the same wide sub-block based on simulation and real data test experiments. Choi et al. 31 proposed a synthetic aperture sonar algorithm with application of compressed sensing, and verified by simulation and experimental data that the proposed method is superior and robust in terms of sensor loss. The traditional feature extraction method is inevitably affected by various noises during the processing of underwater target radiated noise, which leads to the lack of robustness of the underwater target recognition model, poor generalisation ability and low recognition rate, and the method can no longer meet the increasing data demand.
Scholars have done a lot of research on machine learning methods for classification and detection of underwater target radiated noise. Support vector machine is one of the most widely used machine learning methods for classifying and identifying underwater target radiated noise, Sherin and Supriya32–34 used symbiotic biological search algorithm, genetic algorithm and particle swarm optimisation algorithm to optimise the kernel function and parameters of support vector machine, respectively, and used the trained support vector machine model to classify the acoustic features of underwater target radiated noise to achieve high classification accuracy. The model is used to classify the acoustic features of the underwater target radiated noise, and achieves high classification accuracy. With the continuous development of computer technology, deep learning methods have been widely used, Feng et al.35,36 proposed a hydroacoustic target recognition method based on convolution-free architecture, which can perceive global and local information from acoustic spectrograms to improve the accuracy of hydroacoustic target recognition, and the experimental results show that the proposed method outperforms the state-of-the-art convolutional neural network method. Cao et al. 37 proposed a new type of underwater target acoustic feature classification method based on support vector machine kernel function optimisation, and the trained support vector machine model is used for underwater target radiated noise acoustic feature classification to achieve higher classification accuracy. Although deep learning methods have been widely used in underwater target noise classification, they are still based on the classification and identification of pre-processed hydroacoustic signals, and still cannot avoid the problem of losing valid information in the noise processing stage, which limits the improvement of their classification accuracy.
This paper proposes an underwater target radiated noise classification and recognition method based on enhanced image and convolutional neural network, the original underwater target radiated noise signal without any processing is converted into an enhanced image, which is used as input to train the convolutional neural network, and the advantages of convolutional neural network in image classification are used to realise the high-precision recognition of the underwater target, and this paper, by comparing a number of different enhanced image conversion methods, the obtain the optimal enhanced image conversion method.
The enhanced image transformation of underwater target radiation noise can better preserve the global characteristics of the original underwater target radiation noise signal data, reveal new patterns and structural features in the data, and help to capture more complex relationships in the signal data; the enhanced image method does not perform the frequency domain transform, is not subject to the limitations of the frequency domain resolution, and is more robust to the performance of signals in different frequency ranges; the enhanced image method considers the characteristics of the entire signal data, reduces the complexity of parameter selection, and is more general; the enhanced image method better captures the nonlinear relationships and is more sensitive to some nonlinear modes. The enhanced image method takes into account the characteristics of the entire signal data, which reduces the complexity of parameter selection and makes the method more general; the enhanced image method better captures the nonlinear relationships and is more sensitive to some nonlinear patterns.
This paper is organised as follows: Section “Overall architecture of underwater target radiated noise recognition model based on enhanced image-convolutional neural networks” outlines the overall architecture and computational flow of the enhanced image convolutional neural network model, Section “Overview of different image enhancement methods” details the mathematical foundations of the different enhanced image transformation methods and Section “Experiments and results” describes the experimental methods and results of the dataset. Section “Conclusion” provides a discussion and summary.
Overall architecture of underwater target radiated noise recognition model based on enhanced image-convolutional neural networks
The underwater target radiation noise recognition model based on enhanced image and convolutional neural network is to convert the original underwater target radiation noise signal into enhanced image, and then take the converted image as the input of convolutional neural network for model training, and take advantage of convolutional neural network in image classification to perform accurate classification and recognition of underwater target radiation noise. The overall architecture is shown in Figure 1.

Overall architecture of underwater target radiated noise recognition model based on enhanced image-convolutional neural network.
The first step is to transform the original underwater target radiation noise signal into an enhanced image.
The second step is to build a convolutional neural network model, design the network structure and select appropriate network parameters.
In the third step, 70% of the converted enhanced image is extracted as a training set to train the convolutional neural network model.
In the fourth step, 30% of the converted enhanced images are extracted as the test set to validate the convolutional neural network model.
In the fifth step, the results of the underwater target radiated noise detection are obtained to evaluate the classification accuracy of the model.
Overview of different image enhancement methods
Gramian Augular Fields, GAF
GAF method is to convert the scaled underwater target radiated noise signal data from the cartesian coordinate system to the polar coordinate system, and to calculate the dot product and cosine similarity to capture more complex relationships in the underwater target radiated noise signal by considering the angle between different points, and to identify the temporal correlation of different time points. The polar coordinate encoding of the underwater target radiated noise signals in the cartesian coordinate system is used to generate the gramian matrix through trigonometric operations, thus converting the one-dimensional underwater target radiated noise into a two-dimensional image.
First, the original underwater target radiated noise is normalised to the interval [−1, +1], let the underwater target radiated noise signal
where
Then, the normalised Gramian values are transformed into polar coordinates
where
The polar transformation code here incorporates the normalised value
GAF defines a special inner product:
Disassembled to obtain
GAF can be interpreted as inner products with penalty terms:
GAF show the structure of the temporal correlation between the data, with the mean value of the penalty term pointing to
Markov Transition Fields, MTF
MTF is an image coding method based on Markov transfer matrix, which considers the underwater target radiated noise signal along time as a Markov process, from which a Markov transfer matrix is constructed and then extended to a Markov migration field to realise image coding.
For an underwater target radiating noise signal
(1) The underwater target radiated noise
(2) Change each data in the radiated noise signal from the underwater target to its corresponding bin number;
(3) Construct the transfer matrix
(4) Constructing the Markov Transition Fields
Recurrence Plots, RP
Recurrence Plots is an important method for analysing the chaotic and non-smooth nature of underwater target radiated noise, which can reveal the internal structure of underwater target radiated noise signals and provide a priori knowledge of similarity, information content and predictability. The recurrence map represents an image of the distance between trajectories extracted from the original underwater target radiated noise. Given a set of underwater target radiated noise signals
where
where
Relative Position Matrix, RPM
RPM contains the redundant features of the original underwater target radiated noise, which makes it easier to capture the inter- and intra-class similarity information in the transformed image. For an underwater target radiated noise
(1) For the original underwater target radiated noise
where
(2) At this point, the standard normal distribution
Equation (12) allows a reduction from
(3) Construct an
The data sequence is represented by a matrix
(4) The application of min-max normalisation transforms
Wavelet transform
The wavelet transform provides a time-frequency window, the width of which can vary with frequency, which can fully highlight certain features of the signal. The basic idea is to construct a finite-length or fast-decaying mother wavelet, and then to generate multiple sub-wavelets by scaling and panning, which are superimposed to match the input signal. Corresponding their scaling scale and translation parameters to frequency and time parameters, the time-frequency diagram of the signal is finally obtained.
where
According to the definition of the wavelet transform, given an underwater target emitting a noise signal
(1) Determine the length of the radiated noise signal from the underwater target
(2) Calculate the maximum centre frequency
(3) According to the centre frequency and wavelet function, construct the wavelet curve, then convolve it with the original signal to obtain the time distribution vector of the current frequency and update the time-frequency matrix;
(4) Determine whether the current centre frequency is greater than the maximum centre frequency, if so, output the time-frequency matrix
Although the wavelet transform method has better time-frequency resolution and can better emphasise the local characteristics of the underwater target’s radiated noise signal, that is, low frequency resolution and high time resolution are used at high frequencies. However, the wavelet transform method is still a method that requires frequency domain transformation and cannot respond to the overall characteristics of the underwater target’s radiated noise signal, so it is only used as a comparison method in this paper.
Experiments and results
Dataset introduction
The underwater target radiated noise signal used in this paper is simulated in Lei et al. 14 with a sampling time of 1 s and a sampling frequency of 50 kHz. The data parameters are shown in Table 1 and the underwater target radiated noise signal is shown in Figure 2.
The parameters of underwater target radiated noise signal simulation.

The simulated underwater target radiated noise signal.
Enhanced images obtained by different methods
In this paper, different methods of enhanced image conversion are performed for the above underwater target radiation noise signals, and the sliding window size of 5000 and the sliding step size of 5000 are selected to segment the original underwater target radiation noise into 100 segments. The enhanced image conversion is performed for each data segment, and a total of 500 enhanced images are generated for the five types of signals. Relevant examples are shown in Figure 3.

Enhanced images examples obtained by different methods: (a) Gramian Augular Fields, (b) Markov Transition Fields, (c) Recurrence Plots, (d) Relative Position Matrix and (e) wavelet transform.
Figure 3 shows the enhanced images of the same two segments of the same underwater target radiated noise after PAA with the same two signals processed by different methods, respectively, and it can be seen from Figure 3 that there is no relationship between each method, and there is an obvious difference between the enhanced images of two different signals in the same method.
Classification model
In this paper, the powerful advantage of convolutional neural network in picture classification is utilised to build a classification model of underwater target radiated noise, the structural parameters of the convolutional neural network are shown in Table 2.
The primary parameters of a convolutional neural network.
In convolutional neural network, no processing such as compression, transformation of size, etc. It is performed on the original enhanced image to prevent loss of original image information.
Experimental results
In this work, for a particular enhancement image method, 80% of the converted images are extracted as the training set for model training and the remaining 20% are used as the test set for model testing, and the computing environment is Windows 10 operating system with single CPU computer. Figure 4 shows the classification results of the images converted by multiple enhanced image methods, and Figure 5 shows the confusion matrix of multiple methods, these results can also be reflected in Table 3.

Image classification results transformed by multiple enhanced image methods.

Confusion matrix of image classification results transformed by multiple enhanced image methods: (a) Gramian Augular Fields, (b) Markov Transition Fields, (c) Recurrence Plots, (d) Relative Position Matrix and (e) wavelet transform.
Image classification results transformed by multiple enhancement image methods.
From Figure 4, Figure 5 and Table 3, based on the same set of underwater target radiated noise signals, enhanced image conversion by different methods and classification based on the same underwater target radiated noise classification model, the Lagom and angular field classification accuracies and computation times are significantly better than other methods. The wavelet transform method is a time-frequency transform method, and its dependence on the mother wavelet function and frequency resolution is too high, so the recognition accuracy is low. None of the other methods are time-frequency transform methods. The GAF methods convert one-dimensional signals into angles and radii to produce two-dimensional feature maps, and the converted images can retain the time-dependence and potential link features of the data, while at the same time having a high sparsity to eliminate the redundant information between multimodalities. The experimental results also show that the lagomorph and the angular field exhibit strong flexibility and robustness in the analysis of underwater target radiated noise.
Conclusion
In this paper, based on the existing underwater target radiation noise signals, a variety of different enhancement image conversion methods are compared, and the converted image is classified and recognised based on convolutional neural network, in which the time-frequency conversion method and non-time-frequency conversion method are compared, the time-frequency conversion method is limited by the resolution of the frequency domain and the limitations of the time window, which causes the classification results to be inferior to those of the non-time-frequency conversion method, and the time-frequency conversion method can only focus on the local information, which cannot capture the global information, which also makes the application of the method limited. The time-frequency conversion method can only focus on local information and cannot capture global information, which also limits the application of the time-frequency conversion method. Non-time-frequency conversion methods do not require data pre-processing, can retain global information, are not limited by frequency domain resolution, do not rely on time window selection, and are suitable for nonlinear relationships and other advantages, coupled with the excellent performance of convolutional neural networks in image processing, which can effectively learn the spatial characteristics of the image and the local patterns, and can help to extract the image features, and can help to extract the image features such as texture, shape and edges, which are more helpful for accurate classification and recognition, so that the recognition accuracy of the GAF enhanced image conversion method in underwater target radiation noise reaches 83%, and the single core CPU computation time is also compressed to 4253 s, which is much better than other methods. The method described in this paper can also be more effective than the time-frequency conversion method, so it can be generalised to various fields of audio signal processing.
Footnotes
Handling Editor: Chenhui Liang
Author contributions
Conceptualisation, Lei Zhufeng and Lei Xiaofang; methodology, Lei Zhufeng and Zhou Chuanghui; data curation, Wang Jialei; writing – original draft preparation, Lei Zhufeng.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by National Natural Science Foundation of China, grant number 52105068 and Natural Science Basic Research Program of Shaanxi Province, grant number 2023-JC-YB-472.
