Enhancement of speech using deep neural network with discrete cosine transform

Abstract

In this digitized world, the demand of users emphasizes the quality and accuracy. Practically, all variants of signals are analog in nature along with contaminated with noise. In this paper, speech signal is considered. Basically speech signal varies from person to person and time to time. It requires enhancement of the signal for different applications like engineering, medicine and social purposes. Reduction of noise as well as redundant data from the signal can be produced with enhanced versions. As the speech is of nonstationary in nature, in the initial phase, it is processed and normalized. To analyze the speech signal, spectral domain is most suitable and has been utilized. For this purpose, Discrete Cosine Transform (DCT-II) is used. As it has the advantage over other transforms and the calculation is simpler, DCT-II coefficients are further used for Deep Neural Network (DNN) model to reduce the noise and enhance the signal. So that the signal of any environment and of any amount can be enhanced using this model. 100 sentences have been collected form both males and females of 5 each. The sentences have been uttered by the corresponding males and females, 10 sentences each. Though DCT-II and DNN have been applied by many researchers for signal features and image classification, the same have been utilized here for speech enhancement, which is the novelty of this work. The results found better than the other methods applied earlier and it can be best utilized for any real time application. In the result section, the visual inspection is exhibited along with the comparison values. The measuring parameters show its efficacy.

Keywords

Discrete cosine transform deep neural network speech enhancement perceptual evaluation of speech quality segmental signal-to-noise

Get full access to this article

View all access options for this article.

References

Loizou

, Speech Enhancement, Theory and Practice, CRC Press, 2007.

Boll

S.F.

, Suppression of Acoustic Noise in Speech using Spectral Subtraction, IEEE Transaction on Acoustic Speech and Signal Processing1979, pp. 113–120.

Haykin

, Adaptive Filter Theory, Prentice Hall, Upper Saddle River, 3rd, 1996.

Upadhyay

and Jaiswal

R.K.

, Single channel speech enhancement: Using wiener filtering with recursive noiseestimation, Procedia Computer Science8 (2016), 22–30.

Ram

and Mohanty

M.N.

, Performance analysis of adaptive algorithms for speech enhancement applications, Indian Journal of Science and Technology9 (44) (2016), 1–9.

Chaudhari

and Dhonde

S.B.

, A Review on Speech Enhancement Techniques, Int Conf on Pervasive Computing(ICPC), 2015.

Vihari

, Murthy

A.S.

, Soni

and Naik

D.C.

, Comparison of speech enhancement algorithms, Procedia Computer Science89 (2016), 666–676.

Yao

L.X.

, Hua

and Rubo

, Speech enhancement based on discrete cosine transform, Journal of Harbin Engineeing University (2007).

Ram

and Mohanty

M.N.

, Design of Filter using Fractional-DCT for Speech Enhancement, Int Conf on Sustainable Computing Techniques in Engineering, Science and Management, 2017.

10.

Olkkonen

, Discrete Wavelet Transforms-Theory and Applications, Intech Publisher, 2011.

11.

Apolloni

, Bassis

, Marinaro

and Apolloni

, New directions in neural networks, Italian Workshop onNeural Networks, WIRN, Frontiers in Artificial Intelligence and Applications193 (2009).

12.

Nakagawa

, Shikano

, Tohkura

Y.I.

, Speech, hearing and neural network models, IOS Press, 1995.

13.

Fah

L.B.

, Hussain

and Samad

S.A.

, Speech Enhancement by Noise Cancellation Using Neural Network, IEEE Conf, 2000.

14.

Prieto

, Prieto

, Ortigosa

E.M.

, Ros

, Pelayo

, Ortega

and Rojas

, Neural networks: An overview ofearly research, current frameworks and new challenges, Neurocomputing214 (2016), 242–268.

15.

Dufera

B.D.

and Shimamura

, Reverberated Speech Enhancement Using Neural Networks, International Symposiumon Intelligent Signal Processing and Communication Systems, 2009.

16.

Kolbaek

, Tan

Z.H.

and Jensen

, Speech intelligibility potential of general and specialized deep neuralnetwork based speech enhancement systems, IEEE/ACM Transactions on Audio, Speech, and Language Processing25 (1) (2017), 153–167.

17.

Kounovsky

and Malek

, Single Channel Speech Enhancement Using Convolutional Neural Network, IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics(ECMSM)2017, pp. 1–5.

18.

Cao

, Wang

, Ming

and Gao

, A review on neural networks with random weights, Elsevier September (2017). (accepted manuscript).

19.

Kala

M.N.

and Singh

N.N.

, A neural network based effective quality speech enhancement using monte carlo method, Procedia Engineering38 (2012), 698–707.

20.

Ram

and Mohanty

M.N.

, Fractional DCT ADALINE Method for Speech Enhancement, Int Conf on Machine Learning& Computational Intelligence, 2017. (accepted manuscript).

21.

Daqrouq

, Abu-Isbeih

I.N.

and Alfauori

, Speech Signal Enhancement Using Neural Network and Wavelet Transform, International Multi-Conference on Systems, Signals and Devices, 2009.

22.

Kolboek

, Tan

Z.H.

and Jensen

, Speech enhancement using Long Short-Term Memory based Recurrent NeuralNetworks for Noise Robust Speaker Verification, IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 305–311.

23.

, Du

, Dai

L.R.

and Lee

C.H.

, An experimental study on speech enhancement based on deep neural networks, IEEE Signal Processing Letters21 (1) (2014), 65–68.

24.

, Du

, Dai

L.R.

and Lee

C.H.

, A regression approach to speech enhancement based on deep neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing23 (1) (2015).

25.

Xia

and Wang

, Low-dimensional recurrent neural network-based kalman filter for speech enhancement, Neural Networks67 (2015), 131–139.

26.

, Liu

, Shi

, Dong

and Cui

, ILMSAF based speech enhancement with DNN and noise classification, Speech Communication85 (2016), 53–70.

27.

and Kang

, Deep neural network-based linear predictive parameter estimations for speech enhancement, IET Signal Process11 (4) (2017), 469–476.

28.

Goehring

, Bolner

, Monaghan

, Dijk

, Zarowski

and Bleeck

, Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users, Hearing Research344 (2017), 183–194.

29.

and Zhang

, Speech enhancement based on Deep Neural Networks with Skip Connections, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5565–5569.

30.

Koizumi

, Niwa

, Hioka

, Kobayashi

and Haneda

, DNN-Based Source Enhancement Self-Optimized By Reinforcement Learning Using Sound Quality Measurements, IEEE Conf, 2017.

31.

Ram

and Mohanty

M.N.

, Deep Neural Network based Speech Enhancement, Int Conf On Cognitive Informatics & Soft Computing, 2017. (accepted manuscript).

32.

Hou

J.C.

, Wang

S.S.

, Lai

Y.H.

, Lin

J.C.

, Tsao

, Chang

H.W.

, Wang

H.M.

, Audio-Visual Speech Enhancement using Deep Neural Networks, Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, IEEE, 2016.