Abstract
A vehicle, when running, makes a complex sound emission from the engine, the exhaust, the air conditioner, and other mechanical parts. Analysis of this sound for the purpose of vehicle identification is an interesting practice which has security- and transportation-related applications. Engine speed variation, which causes shifts in the frequency content of the emissions, makes Fourier-based methods ineffective in terms of providing a stable signature for the vehicle. We search for an engine speed–independent acoustic signature for the vehicle, and for this purpose, we propose wavelet packet analysis rather than traditional time- or frequency-domain methods. Wavelet packet analysis, by providing arbitrary time–frequency resolution, enables analyzing signals of stationary and non-stationary nature. It has better time representation than Fourier analysis and better high-frequency resolution than wavelet analysis. Under varying engine speed, sound emissions are recorded from four cars and analyzed by wavelet packet analysis. Wavelet packet analysis subimages are further analyzed to obtain feature vectors in the form of log energy entropy, norm entropy, and energy. These feature vectors are fed into a classifier, multilayer perceptron, for evaluation. While norm entropy achieves a classification rate of 100%, log energy entropy and energy achieves classification rates of 99.26% and 97.79%, respectively. These results indicate that, wavelet packet analysis along with norm entropy and multilayer perceptron provides an accurate vehicle-specific acoustic signature independent of the engine speed.
I. Introduction
A vehicle, when running, produces a complex sound emission from its engine, the exhaust, air conditioner, and other mechanical parts and, if it is moving, from tires and air friction. With highly varying parameters, this sound carries certain diagnostic values including vehicle detection and identification and information regarding fault diagnosis. The possibility of identification and diagnosis of vehicles based on the analysis of sound emissions has been investigated by researchers. Various methods have been proposed to analyze the signals. Below is a chronological review of time-domain, frequency-domain, time–frequency domain, and hybrid analysis methods used in vehicle acoustic signal analysis.
Some of the researchers used time-domain features, for example, Mazarakis and Avaritsiotis 1 used a time-domain encoding and feature extraction to classify tracked vehicles and heavy trucks using their acoustic and seismic signatures; Paulraj et al. 2 used multi-frame time-domain features for classification of moving vehicles; Wang and Zhou 3 used improved time-encoded signal-processing algorithm for feature extraction in acoustic vehicle-type recognition; Rahim et al. 4 used time-domain features for the classification of moving vehicles; Paulraj et al. 5 used autoregressive modeling for vehicle-type classification; Mayvan et al. 6 used quadratic discriminant analysis to classify audio signals of passing vehicles to bus, car, motor, and truck categories based on features such as short-time energy, average zero-crossing rate, and pitch frequency of periodic segments of signals; and Ishida et al. 7 used dynamic time warping to design an acoustic vehicle-count system.
Some of the researchers used frequency-domain features, for example, Nooralahiyan et al. 8 used linear predictive coding coefficients, auditory model processing and Fourier transform for acoustic signature analysis for vehicle classification; Wu et al. 9 used frequency vector principal component analysis for vehicle sound signature recognition; Nooralahiyan et al. 10 used linear predictive coding for vehicle classification by acoustic signature; Munich 11 used a probabilistic classifier that is trained on the principal components subspace of the short-time Fourier transform for acoustic signature recognition of vehicles; Sun and Daigle 12 used fast Fourier transform (FFT) magnitudes for vehicle classification; Yang et al. 13 used overall shape of sound spectrum for vehicle identification; Malhotra et al. 14 used features based on FFT and power spectral density (PSD) to classify running vehicles; Yang et al. 15 used discrete spectrums to identify vehicles in wireless sensor networks; Lu et al. 16 used spectral features to classify running vehicle types into gasoline light-wheeled, gasoline heavy-wheeled, diesel truck and motorcycle; Lu et al. 17 used gammatone filterbanks as spectral features for detecting acoustic signature of approaching vehicles; Malhotra et al. 18 used PSD-based features for classifications in audio-sensor networks; Guo et al. 19 used a number of harmonic components and a group of key frequency components for ground vehicle classification; Bikdash et al. 20 used tristimulus response for classification of civilian vehicles from acoustic data; Malhotra et al. 21 used Aura matrices to create a new feature derived from the PSD and dynamic multidimensional PSD for vehicle classification by wireless audio-sensor networks; Kozhisseri and Bikdash 22 used spectral features for the classification of civilian vehicles using acoustic sensors; Changjun and Yuzong 23 used short-time Fourier transform of acoustic and seismic signals for vehicle classification; Rahim et al. 24 used one-third octave filter bands for moving vehicle noise classification; Özgündüz et al. 25 used Mel-frequency cepstral coefficients for vehicle identification using acoustic and seismic signals; Mato-Mendez and Sobreira-Seoane 26 used Mel-frequency cepstral coefficients, sub-band energy ratio, spectral centroid, and spectral roll-off point to classify vehicles; Zhao et al. 27 used PSD to recognize status of ball mill load using shell vibration signal; Bhave and Rao 28 used formant-based feature and Mel-frequency cepstral coefficients for vehicle engine sound analysis applied to traffic congestion estimation; Gorski and Zarzycki 29 used Harmonic line, Shur coefficients and Mel filters methods for feature extraction in acoustic vehicle classification; Guo et al. 30 used a number of harmonic components and a group of key frequency components for ground vehicle classification; Rahim et al. 31 used one-third octave filter band for moving vehicle classification; Rahim et al. 31 used one-third octave frequency spectrum analysis to classify type and the distance of a moving vehicle with types: car, bike, lorry, and truck; Biernacki 32 used harmonic features and correlation features for acoustic vehicle identification; Zambon et al. 33 used a database of vehicles to statistically analyze one-third octave band spectra of emission noise in terms of its relevance to vehicle speed; Sunu and Percus 34 used frequency signatures to classify vehicles; and Zambon et al. 35 used statistical analysis of recorded spectra of vehicles to detect its relevance to vehicle speed.
Some of the researchers used time–frequency domain features, for example, Averbuch et al. 36 used wavelet packet energy for the classification and detection of moving vehicles; Lu et al. 37 assembled gammatone feature vectors over multiple temporal frames to establish a high dimensional spectro-temporal representation for noise-independent vehicle sound recognition; Averbuch et al. 38 used energy of wavelet packet transform (WPT) for acoustic detection of moving vehicles; and Schclar et al. 39 used total magnitude (L1 norm) of the coefficients from wavelet packet decomposition (WPD) to detect vehicles.
Some of the researchers used hybrid features, for example, Aljaafreh and Dong 40 used PSD of short-time Fourier spectrum and energy of WPT for vehicle classification based on acoustic signals; Padmavathi et al. 41 used signal energy, energy entropy, zero-crossing rate, spectral roll-off, spectral centroid, and spectral flux for vehicle acoustic signal classification; Aljaafreh and Al-Fuqaha 42 used spectrum analysis and energy of WPT for acoustic classification of multiple targets; George et al. 43 used short-time energy, log energy, and smoothed log energy for vehicle detection and Mel-frequency cepstral coefficients for vehicle classification; Kakar and Kandpal 44 reviewed time-domain, frequency-domain and time–frequency domain feature extraction methods that are used in classification of vehicles; and Shah and Mehta 45 used time-domain and frequency-domain features for the analysis of acoustic signals for vehicle classification of four wheeler models.
Time-domain, frequency-domain (Fourier), and time–frequency domain (Wavelet) analysis are the main tools for analyzing signals, but Fourier analysis has poor time representation and wavelet analysis has poor resolution at high frequency. Wavelet packet analysis (WPA) overcomes both of these, and the arbitrary time–frequency resolution enables analysis of signals of both stationary and non-stationary nature.
In search for an engine speed–independent acoustic signature for vehicles, we propose a method based on WPA of the sound signals for the aforementioned reasons. Moreover, since frequency content of the signals shifts under varying engine speeds, Fourier-based methods would not provide a stable signature. To apply the proposed method, we record sound emissions from four vehicles by varying the engine speed. Output of the WPA is a set of subsignals. We analyze these subsignals further using norm entropy, log energy entropy, and energy. For classification, we feed these feature vectors to a multilayer perceptron, which is a kind of neural network that is able to learn patterns from a set of data and generalize to a new set. It has been shown to be effective in pattern recognition and classification problems. We evaluate the three features given for their classification accuracy.
II. Material
Sound emissions are recorded from four vehicles: a Ford Kuga 1.5 L Ecoboost, a Skoda Fabia 1.4 L, a Volkswagen Passat 1.6 L TDi, and a Ford Tourneo Courier 1.5 L TDCi. Measurements are made using a digital recorder attached to a microphone with a sampling rate of 44,100 Hz. Recordings are taken from a location 1 m away from the engine on the left side of the vehicle. We choose this as a reasonable distance to be used in a real vehicle detection and identification system. Since the minimum idle running speed of the vehicle engines is 800 r/min, we start at that rate and increase until 3000 r/min. An RPM ranging around 800–3000 r/min is chosen because it is the usual running range of the vehicle engines unless aggressive driving. We increase the engine speed slowly beginning with 800 r/min going up to 3000 r/min and then decrease back, which we repeat two times. We would like about 30 training samples and 30 test samples from each vehicle, each of which is 0.907 s long in order to contain 40,000 time steps. We choose this window width based on our earlier experience in similar problems. In order to obtain aforementioned training and test samples, we arrange each increase and decrease episode to be roughly 30 s. Figure 1 shows 1 min recordings from each vehicle which include two episodes of increase and decrease of engine speeds. The first increase and decrease episodes are used to extract training samples, and the second episodes are used to extract test samples. This procedure ensures that training and test samples contain sound emissions from the whole range of engine speeds between 800 and 3000 r/min. Total number of training samples is 130 and testing samples is 136.

Sound signals recorded from four vehicles.
III. Method
Our analysis and classification workflow is such that each signal is first decomposed into wavelet packet subsignals by WPA. Then features are extracted from these subsignals in the form of norm entropy, log energy entropy, and energy. To evaluate these features we feed them into a classifier, multilayer perceptron (MLP). During this procedure, we fine tune the following parameters: WPA depth, mother wavelet used, which nodes (all or final) of WPA to include in the analysis, and the neural network parameters.
We analyze the contribution of each component of the feature vector which corresponds to the nodes of WPA by a barplot of the mean values and then a boxplot of the feature at each node of the WPA for each vehicle. These let us to decide whether a reduction in the feature vectors or further fine-tuning of parameters is needed.
A. WPD
WPD is used to decompose the signals. Wavelet packets are a generalization of wavelet bases by taking linear combinations of wavelet functions. 46 In the following explanation, we take a parallel approach to Yen and Lin 47 and Wu and Liu. 48
A wavelet function has three indices,
The first two wavelet packet functions are a scaling function and the mother wavelet function
Wavelet packet functions with higher oscillation parameters are
where
WPD is applied as shown in
Figure 2
for three levels. The left hand–side sub-branches are obtained by low-pass filter

WPD tree up to three levels.
B. Feature
Norm entropy, log energy entropy, and energy are used as feature vectors for prediction. Entropy and energy are common measures used in signal processing, which are able to extract useful information from a signal
where
C. Classifier: MLP
Various tools including neural networks are available for classification. We choose MLP with backpropagation learning which can efficiently process large data sets and has been shown to be effective in pattern recognition and classification.50,51
MLP is a network of nodes arranged in layers. A node can be modeled as an artificial neuron that computes weighted sums of inputs with bias and presents it to an activation function. A general MLP model is shown in Figure 3 . Linear activation functions were used for input and output layers and hyperbolic tangent sigmoid activation function for the hidden layer(s) which are in the form
where a is the input to the neuron. Training of the MLP is the adjustment of the weight parameters to map the input to the output with minimum error. For this purpose, backpropagation algorithm is adopted where the error between the actual output of the network and the target is backpropagated through the network to adjust the weight parameters.52,53

General architecture of the MLP.
IV. Results and Discussion
We start by WPD of the signals at level 1 and increase until level 5, which gives the best classification accuracy. We search the Daubechies mother wavelet space by starting with the simplest and increasing complexity until reaching best accuracy at db4. First, we include all nodes of WPD up to the final level and then switch to using only the final nodes which gives better accuracy.
Figure 4
shows the WPD of signals at 800, 2000, and 3000 r/min for each vehicle. Each horizontal line represents a subimage, and low-frequency nodes are in the upper region. Visually, vehicles 3 and 4 are easily differentiable from each other and from vehicles 1 and 2. MLP parameters, the number of hidden layers, and the number of neurons are determined by the method of pruning. This is to start with one hidden layer of 1 neuron and increase until reaching the best classification accuracy at two layers with eight and four neurons.

WPD of signals at speeds 800, 2000, and 3000 r/min.
Best-performing parameter values and accuracies achieved.
WPD: wavelet packet decomposition; MLP: multilayer perceptron.
Norm entropy achieves 100% classification accuracy with
We plot mean values of norm entropy corresponding to final 32 nodes of WPD for each vehicle in Figure 5 . It is easy to see that mean of the feature vector is very different for each vehicle. In order to see the contribution of each node of WPD, we boxplot norm entropy of four vehicles at each node of the WPD in Figure 6 . The central mark of each box is the median, the edges are the 25th and 75th percentiles, and the whiskers cover the most extreme data points that are not outliers. Outliers are plotted individually. In these plots, it is easy to see that norm entropy ranges for each vehicle is mostly different from the others in most of the nodes. But it is hard to estimate the contribution of each node, so we keep all of the nodes in the analysis and decide not to make any reduction in the feature space. Reduction, which might reduce the complexity of the method, can be considered as a future work, but it is not essential since we have reached 100% classification accuracy, so we have found the right feature which provides a precise engine speed–independent acoustic signature for vehicles: WPA followed by norm entropy and MLP. Figure 7 shows the training performance of the best-performing MLP for the classification when feature is norm entropy.

Mean values of norm entropy for final 32 nodes of WPD for four vehicles.

Boxplot of norm entropy at final 32 nodes of WPD at depth 5 for four vehicles.

Training performance of the MLP with two hidden layers of 8 and 4 neurons using norm entropy.
V. Conclusion
An approach for defining an engine speed–independent acoustic signature for vehicles is presented. Wavelet packet analysis is chosen as the analysis tool and is explored for different wavelet bases at various depths. Daubechies db4 mother wavelet function is found to give the best classification accuracy at depth 5. Wavelet packet subimages are analyzed further using three features: norm entropy, log energy entropy, and energy. These features are evaluated for their classification accuracy using an MLP as a classifier, and norm entropy is found to give the best result of 100% classification rate with
Footnotes
Acknowledgements
The author would like to thank Mr Muammer Göksu, Mr Mustafa Turgut, and Ms Azize Yasemin Göksu Erol for making their vehicles available for measurements of this manuscript.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
