Abstract
Estimation of vehicle speed by analysis of drive-by noise is a known technique. The methods used in this kind of practice generally estimate the velocity of the vehicle with respect to the microphone(s), so they rely on the relative motion of the vehicle to the microphone(s). There are also other methods that do not rely on this technique. For example, recent research has shown that there is a statistical correlation between vehicle speed and drive-by noise emissions spectra. This does not rely on the relative motion of the vehicle with respect to the microphone(s) so it inspires us to consider the possibility of predicting velocity of the vehicle using an on-board microphone. This has the potential for the development of a new kind of speed sensor. For this purpose we record sound signal from a vehicle under speed variation using an on-board microphone. Sound emissions from a vehicle are very complex, which is from the engine, the exhaust, the air conditioner, other mechanical parts, tires, and air resistance. These emissions carry both stationary and non-stationary information. We propose to make the analysis by wavelet packet analysis, rather than traditional time or frequency domain methods. Wavelet packet analysis, by providing arbitrary time-frequency resolution, enables analyzing signals of stationary and non-stationary nature. It has better time representation than Fourier analysis and better high-frequency resolution than Wavelet analysis. Subsignals from the wavelet packet analysis are analyzed further by Norm Entropy, Log Energy Entropy, and Energy. These features are evaluated by feeding them into a multilayer perceptron. Norm entropy achieves the best prediction with 97.89% average accuracy with 1.11 km/h mean absolute error which corresponds to 2.11% relative error. Time sensitivity is ±0.453 s and is open to improvement by varying the window width. The results indicate that, with further tests at other speed ranges, with other vehicles and under dynamic conditions, this method can be extended to the design of a new kind of vehicle speed sensor.
Introduction
A vehicle can produce a complex sound emission from its engine, the exhaust, the air conditioner, tires, other mechanical systems, and air friction. The possibility of making diagnosis or identification of vehicles based on sound emission analysis has been investigated by researchers.
An important parameter in the vehicle noise emission is vehicle speed. Speed estimation by analyzing drive-by noise of vehicles has been investigated by researchers. Some of these techniques are based on Doppler shift, 1 which depends on the relative motion of the vehicle with respect to the receiver microphone. There are a few other techniques that use signals from microphone arrays and/or omnidirectional microphones2–6 which also rely on the relative motion of the vehicle with respect to the receiver microphone(s). Although still using drive-by noise of the vehicles, there are a few other techniques that do not rely on the relative motion of the vehicle with respect to the microphone.7–9 Recently, Zambon et al. 7 used a database of vehicles to statistically analyze 1/3-octave band spectra of emission noise in terms of its relevance to vehicle speed and used recorded spectra of vehicles to detect a statistical correlation between the vehicle speed and noise spectra. 8 Their method does not rely on the relative motion of the vehicle with respect to the microphone(s). So this inspires us to consider the possibility of using a similar method to estimate the speed of a vehicle using sound signals recorded from an on-board microphone. This may result in a new kind of speed sensor.
Identification, diagnosis, and parameter estimation of vehicles based on the analysis of sound emissions has been investigated by researchers. Various methods have been proposed for these analysis. Below is a chronological review of time-domain, frequency-domain, time–frequency domain, and hybrid analysis methods used in vehicle acoustic signal analysis.
Some of the researchers used time-domain features, for example, Mazarakis and Avaritsiotis 10 used a time-domain encoding and feature extraction to classify tracked vehicles and heavy trucks using their acoustic and seismic signatures; Paulraj et al. 11 used multi-frame time-domain features for classification of moving vehicles; Wang and Zhou 12 used an improved time-encoded signal-processing algorithm for feature extraction in acoustic vehicle-type recognition; Rahim et al. 13 used time-domain features for the classification of moving vehicles; Paulraj et al. 14 used autoregressive modeling for vehicle-type classification; Mayvan et al. 15 used quadratic discriminant analysis to classify audio signals of passing vehicles to bus, car, motor, and truck categories based on features such as short-time energy, average zero-crossing rate, and pitch frequency of periodic segments of signals; and Ishida et al. 16 used dynamic time warping to design an acoustic vehicle-count system.
Some of the researchers used frequency-domain features, for example, Nooralahiyan et al. 17 used linear predictive coding coefficients, auditory model processing, and Fourier transform for acoustic signature analysis for vehicle classification; Wu et al. 18 used frequency vector principal component analysis for vehicle sound signature recognition; Nooralahiyan et al. 19 used linear predictive coding for vehicle classification by acoustic signature; Munich 20 used a probabilistic classifier that is trained on the principal components subspace of the short-time Fourier transform for acoustic signature recognition of vehicles; Sun and Daigle 21 used fast Fourier transform (FFT) magnitudes for vehicle classification; Yang et al. 22 used overall shape of sound spectrum for vehicle identification; Malhotra et al. 23 used features based on FFT and power spectral density (PSD) to classify running vehicles; Yang et al. 24 used discrete spectrums to identify vehicles in wireless sensor networks; Lu et al. 25 used spectral features to classify running vehicle types into gasoline light-wheeled, gasoline heavy-wheeled, diesel truck, and motorcycle; Lu et al. 26 used gammatone filterbanks as spectral features for detecting acoustic signature of approaching vehicles; Malhotra et al. 27 used PSD-based features for classifications in audio-sensor networks; Guo et al. 28 used a number of harmonic components and a group of key frequency components for ground vehicle classification; Bikdash et al. 29 used tristimulus response for classification of civilian vehicles from acoustic data; Malhotra et al. 30 used Aura matrices to create a new feature derived from the PSD and dynamic multidimensional PSD for vehicle classification by wireless audio-sensor networks; Kozhisseri and Bikdash 31 used spectral features for the classification of civilian vehicles using acoustic sensors; Changjun and Yuzong 32 used short-time Fourier transform of acoustic and seismic signals for vehicle classification; Rahim et al. 33 used one-third octave filter bands for moving vehicle noise classification; Özgündüz et al. 34 used Mel-frequency cepstral coefficients for vehicle identification using acoustic and seismic signals; Mato-Méndez and Sobreira-Seoane 35 used Mel-frequency cepstral coefficients, sub-band energy ratio, spectral centroid, and spectral roll-off point to classify vehicles; Zhao et al. 36 used PSD to recognize status of ball mill load using shell vibration signal; Bhave and Rao 37 used formant-based feature and Mel-frequency cepstral coefficients for vehicle engine sound analysis applied to traffic congestion estimation; Górski and Zarzycki 38 used Harmonic line, Shur coefficients, and Mel filters methods for feature extraction in acoustic vehicle classification; Guo et al. 39 used a number of harmonic components and a group of key frequency components for ground vehicle classification; Rahim et al. 40 used one-third octave filter band for moving vehicle classification; Rahim et al. 40 used one-third octave frequency spectrum analysis to classify type and the distance of a moving vehicle with types: car, bike, lorry, and truck; Biernacki 41 used harmonic features and correlation features for acoustic vehicle identification; Zambon et al. 7 used a database of vehicles to statistically analyze one-third octave band spectra of emission noise in terms of its relevance to vehicle speed; Sunu and Percus 42 used frequency signatures to classify vehicles; and Zambon et al. 8 used statistical analysis of recorded spectra of vehicles to detect its relevance to vehicle speed.
Some of the researchers used time–frequency domain features, for example, Averbuch et al. 43 used wavelet packet energy for the classification and detection of moving vehicles; Lu et al. 44 assembled gammatone feature vectors over multiple temporal frames to establish a high-dimensional spectro-temporal representation for noise-independent vehicle sound recognition; Averbuch et al. 45 used energy of wavelet packet transform (WPT) for acoustic detection of moving vehicles; and Schclar et al. 46 used total magnitude (L1 norm) of the coefficients from wavelet packet decomposition (WPD) to detect vehicles.
Some of the researchers used hybrid features, for example, Aljaafreh and Dong 47 used PSD of short-time Fourier spectrum and energy of WPT for vehicle classification based on acoustic signals; Padmavathi et al. 48 used signal energy, energy entropy, zero-crossing rate, spectral roll-off, spectral centroid, and spectral flux for vehicle acoustic signal classification; Aljaafreh and Al-Fuqaha 49 used spectrum analysis and energy of WPT for acoustic classification of multiple targets; George et al. 50 used short-time energy, log energy, and smoothed log energy for vehicle detection and Mel-frequency cepstral coefficients for vehicle classification; Kakar and Kandpal 51 reviewed time-domain, frequency-domain, and time–frequency domain feature extraction methods that are used in classification of vehicles; and Shah and Mehta 52 used time-domain and frequency-domain features for the analysis of acoustic signals for vehicle classification of four-wheeler models.
Time domain, frequency domain (Fourier), and time–frequency domain (Wavelet) analysis are the main tools for analyzing signals but Fourier analysis has poor time representation, and wavelet analysis has poor resolution at high frequency. Wavelet packet analysis (WPA), on the other hand, overcomes both of these, and the arbitrary time–frequency resolution enables analysis of signals of stationary and non-stationary nature.
In this work, we would like to predict the speed of a vehicle based on the analysis of its sound emissions recorded from an on-board microphone. For this purpose, we record sound of a vehicle by varying and recording its speed. For its aforementioned advantages, we choose to make the analysis by WPA. Output of the WPA is a set of subsignals whose number depends on the depth of the WPA. From these subsignals, we extract several features including Energy, Log Energy Entropy, and Norm Entropy. We evaluate these features and choose whichever results in best prediction. To map the feature vectors to vehicle speed, we use a multilayer perceptron (MLP) which is a kind of neural network which is proven to be successful in black box modeling and function approximation. Although, in the past studies, WPA has been used along with energy in vehicle acoustic analysis,43,45 Entropy is used for the first time by us in vehicle acoustic emission analysis along with WPA in this study.
Material
Sound emissions are recorded from a Ford Kuga 1.5 Ecoboost, using its cruise control to control its speed in 1 km/h increments starting from 30 km/h and going up to 80 km/h. Sound is recorded by placing the microphone inside the vehicle, on the front passenger’s seat, windows closed, and the air conditioner on, which is a significant noise source. Recording is done by a digital recorder at 44,100 Hz for 30 s at each speed step. Recordings from each speed step are partitioned into 40 windows to provide 20 parts for training and 20 parts for testing for each speed step. A total of 475 sets of training and 475 for testing is obtained. Figure 1 displays one set of data from each of 30, 40, 50, 60, 70, and 80 km/h speed steps. Since each window is 0.907 s long, we have a time sensitivity of ±0.453 s.

Sound signals recorded at speeds 30, 40, 50, 60, 70, and 80 km/h.
Method
Our analysis and prediction workflow is such that each signal is first decomposed into wavelet packet subsignals using WPA. Then, features are extracted from these subsignals in the form of Norm Entropy, Log Energy Entropy, and Energy. To select best of these features, we feed them into the prediction tool, the MLP, to further fine tune the parameters: WPA depth, mother wavelet, and which nodes (all or final) of WPA to include in the prediction and the neural network parameters. After this step, we analyze the contribution of each component of the feature vector which corresponds to the nodes of WPA by a plot of mean of the feature vectors at different speed steps and then box plot the features at each of the nodes for varying speeds. These results let us decide whether further reduction in the feature space or fine-tuning of the parameters is needed.
WPD
WPD is used to decompose the signals. Wavelet packets are a generalization of wavelet bases by taking linear combinations of wavelet functions. 53 In the following explanation, we take a parallel approach to Yen and Lin 54 and Wu and Liu. 55
A wavelet function has three indices, j: index scale (integer), k: translation (integer), n: oscillation parameter; and t is time
The first two wavelet packet functions are a scaling function and the mother wavelet function
Wavelet packet functions with higher oscillation parameters are
where h(k) and g(k) are quadrature mirror filters 56 associated with the scaling function and the mother wavelet function. The wavelet packet coefficients are defined as the inner product of wavelet packet functions with the input signal f(t), which also defines the range of t
WPD is applied as shown in Figure 2 for three levels. The left-hand side sub-branches are obtained by low pass filter h(k) and decimation; the right-hand side sub-branches are obtained by high pass filter g(k) and decimation. S is the original signal, A stands for approximation, and D for detail and the number for level.

WPD tree up to three levels.
Features
Norm Entropy, Log Energy Entropy, and Energy are used as feature vectors for prediction. Entropy and energy are common measures used in signal processing which are able to extract useful information from a signal
where
Predictor: MLP
We choose MLP with backpropagation learning which can efficiently process large datasets and has been shown to be effective in black box modeling and function approximation.57,58
MLP is a network of nodes arranged in layers. A node can be modeled as an artificial neuron that computes weighted sums of inputs with bias and presents it to an activation function. A general MLP model is shown in Figure 3.

General architecture of the MLP.
Linear activation functions are used for input and output layers and hyperbolic tangent sigmoid activation function for the hidden layer(s) which are in the form
where
Results and discussion
We start by WPD of the signals. We start with Daubechies db2 as our mother wavelet and search the Daubechies family looking for best prediction accuracy. Level of the WPD is determined by a search starting with 2 and increasing until best prediction accuracy. We include all nodes of the WPD up to the final level and then switch to using only the final nodes which gives better prediction accuracy. We use Log Energy Entropy, Energy, and Norm Entropy as feature vectors and see that Norm Entropy gives best prediction accuracy with parameter p = 1.1, which is found by a search in the interval (1, 2). Neural network parameters, which are the number of hidden layers and number of neurons, are determined by pruning. This is to start with the simplest network of 1 hidden layer of 1 neuron and increasing complexity until best prediction accuracy. WPD depth is increased one step at a time until level 6, which gives the best accuracy. The WPD subsignals of the final nodes at depth 6 corresponding to V = 30 km/h and V = 60 km/h are shown in Figure 4. Each horizontal line in this figure is a subsignal. Visually it is not easy to differentiate between the subsignals under speed variation.

WPD subsignals of final nodes at level 6 for V = 30 km/h (left) and V = 60 km/h (right).
The search we perform in the parameter spaces is given in Table 1. Underlined values show the parameter that is changed with respect to the previous line. As we can see, Norm Entropy achieves the best result with a mean absolute prediction error of 1.27 km/h which corresponds to a relative error of 2.46%. The parameters that correspond to this best prediction are given in Table 1. From our earlier experience in similar applications, we have seen that if Energy, Log Energy Entropy, and Norm Entropy are compared in terms of classification or prediction accuracy at one set of parameters, the one that performs the best in that set of parameters continue to perform the best at other parameter values. Or if the parameters are optimized using one of these features, it means they are optimized for all three features. Therefore, in our search listed in Tables 1 and 2, once we optimized the parameters using Norm Entropy, it means we have optimized the parameters for all three features; afterward we try Log Energy Entropy and Energy, and since we see that Norm Entropy performs the best, we continue with it in the rest of the study.
WPD depth, mother wavelet, feature entropy, p value, WPD nodes used in classification, number of MLP hidden layers, number of neurons, and classification errors for prediction.
WPD: wavelet packet decomposition; MLP: multilayer perceptron.
Values that are underlined show the parameters that are changed with respect to previous line and bold lines show the best result.
WPD depth, mother wavelet, feature entropy, p value, WPD nodes used in classification, number of MLP hidden layers, number of neurons, and classification errors for prediction.
WPD: wavelet packet decomposition; MLP: multilayer perceptron.
Underlined values show the parameters that are changed with respect to previous line and bold lines show the best result.
To see the contribution of each node to prediction, we first plot mean values of Norm Entropy corresponding to final 64 nodes of WPD until depth 6 for speeds V = 30, 40, 50, 60, 70, and 80 km/h in Figure 5. Visually only several nodes from the first part seem to be contributing to identification by showing variation at different speeds, but it is not easy to see which nodes these are.

Mean values of Norm Entropy for final 64 nodes of WPD corresponding to vehicle speeds 30, 40, 50, 60, 70, and 80 km/h.
A box plot of norm entropy at each of the nodes at speeds V = 30, 40, 50, 60, 70, and 80 km/h in Figures 6 and 7 shows us exactly which nodes are contributing to prediction. In these figures, the central mark of each box is the median, the edges are the 25th and 75th percentiles, and the whiskers cover the most extreme data points that are not outliers. Outliers are plotted individually as red “+” signs. These plots show that only at the first 16 nodes, speeds are differentiable from each other, therefore only these nodes contribute to prediction. These are the lower frequency nodes. The rest of the nodes with higher frequency show less variation and they overlap with each other under varying speeds so they seem to be useless in prediction. So we decide to continue by keeping only the first 16 nodes among the final nodes of WPD at depth 6 in our analysis.

Box plot of Norm Entropy for nodes 1–32 of final nodes of WPD at depth 6 corresponding to vehicle speeds 30, 40, 50, 60, 70, and 80 km/h.

Box plot of Norm Entropy for nodes 33–64 of final nodes of WPD at depth 6 corresponding to vehicle speeds 30, 40, 50, 60, 70, and 80 km/h.
Table 2 shows our results using only the first 16 nodes. We again perform a search in the parameter spaces to fine-tune the parameters. We see that these results are better than the ones in Table 1, which shows that keeping only the first 16 nodes was the right choice. Now since the input size of the MLP has changed, we fine-tune MLP parameters too. We see that we achieve the best prediction with 1.11 km/h mean absolute error and its corresponding 2.11% relative error with MLP with two hidden layers of 10 and 5 neurons. Figure 8 shows the training performance of this final MLP.

Training performance of the MLP with two hidden layers of 10 and 5 neurons.
Figure 9 shows the prediction results by plotting the actual speeds and predicted speeds for all test instances. We see that prediction is mostly consistent, that is, random fluctuations in prediction are minor. There are fluctuations as a group of neighboring instances which means that there is inaccuracy in the control and measurement of the speed. As we mentioned earlier, speed control and measurement is done using the cruise control of the vehicle. This is a device which tries to keep the speed fixed at a determined value. But as with all control systems, it can control with a certain uncertainty. The road slope variation causes vehicle speed to follow a variable pattern. This shows that if measurement and control could be done more precisely, the method would perform better. Under these conditions, the results indicate that our method is able to predict the speed of the vehicle with 1.11 km/h average absolute error and its corresponding 2.11% relative error with ±0.453 s time sensitivity by sound emission analysis using an on-board microphone. These results are in the speed range between 30 and 80 km/h and they are under steady-state conditions. Testing with other vehicles, at other speed ranges and under dynamic conditions, the method can be extended for the design of a new kind of speed sensor. Speed measurement and control must be done more precisely in these tests. Time sensitivity may also be improved by reducing the window size. The algorithm can be optimized along with the use of a faster CPU. These are to be performed in a future study in order to generalize the method to the design of a new kind of vehicle speed sensor.

Actual speeds (blue) and predicted speeds (red) for all test instances.
Conclusion
An approach for predicting vehicle speed by sound emission analysis using an on-board microphone is presented. WPD is used as the analysis tool and is explored for different wavelet base functions at various depths. Daubechies db8 mother wavelet is found to give the best result at depth 6. Features of Log Energy Entropy, Norm Entropy, and Energy are explored and Norm Entropy is found to give the best accuracy with p = 1.1. Using the final nodes of the WPD than using all nodes gives better prediction accuracy. Analysis of the variation of Norm Entropy among final nodes shows us that the first 16 nodes, which are the lower frequency nodes, show distinction under varying speed, therefore only these contribute to prediction and we only keep these nodes in the remaining part of the analysis. Fine-tuning of the parameters is finalized by pruning the MLP until two hidden layers of 10 and 5 neurons which gives the best accuracy. The aforementioned search in the parameter spaces serves as an optimization and contributes to the success of our method. Under the limitations of controlling the vehicle speed with the cruise control of the vehicle, an average prediction rate of 97.89% is achieved with 1.11 km/h mean absolute error and 2.11% relative error. This is in the speed range between 30 and 80 km/h and under steady-state condition. The plot of the actual and predicted speeds shows us that predicted speeds sometimes fluctuate consistently as a group in a neighborhood, which shows that there is uncertainty in controlling and measurement of the speed of the vehicle with cruise control because of varying road slopes. In a future study with better control and measurement of the vehicle speed and doing tests with other vehicles, at other speed ranges and under dynamic conditions, the proposed method can be generalized for use in the development of a new kind of speed sensor. Time sensitivity is ±0.453 s and can be improved using other window widths. The algorithms can be optimized along with the use of a faster CPU. Overall, current results present us with a promising candidate for the development of a new kind of vehicle speed sensor by sound signal analysis using an on-board microphone.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
