Speech Enhancement using Fully Connected Deep Neural Network for Hindi Speech Corrupted by Nonstationary Noises

Abstract

Fully Connected Deep Neural Network (FCDNN) are used for speech enhancement for Hindi speech databases contaminated by a diverse range of background noises. The database includes both stationary and nonstationary noises such as Car Noise, Factory Noise, Machine Gun Noise and Fighter Plane Noise. These noises are added artificially to clean speech signal at varying input Signal-to Noise Ratio (SNR) levels i.e., −5, 0, 5, and 10 db to simulate real-world scenarios with different levels of noise interferences. The background noise, such as Machine Gun and Factory Noise are more non-stationarity compared to Car Noise and Fighter Plane Noises. This distinction underlines the importance of evaluating speech enhancement systems under diverse noise conditions to assess their robustness in real-world applications. The proposed system demonstrates significant improvements in SNR, PESQ and STOI for all four noises. Even with a speech signal corrupted by a highly nonstationary machine gun noise at −5 db input SNR level, an SNR improvement of 13.94 db with PESQ value 2.91 and STOI 0.94 is observed, which shows recovered speech quality and intelligibility is retained. Such findings from the results highlighted the effectiveness of FCDNN-based approaches in removing both stationary and nonstationary background noises from corrupted speech signals. Overall, this research contributes to enhance the quality and intelligibility of speech signals in noisy environments by leveraging the capabilities of deep learning techniques.

Keywords

FCDNN SNR PESQ STOI spectral kurtosis validation STFT backpropagation stationary and nonstationary noise

Get full access to this article

View all access options for this article.

References

Antoni

(2006). The spectral kurtosis: A useful tool for characterising non-stationary signals. Mechanical Systems and Signal Processing, 20(2), 282–307. https://doi.org/10.1016/j.ymssp.2004.09.001

Eftekharnejad

Carrasco

M. R.

Charnley

Mba

(2011, February). The application of spectral kurtosis on acoustic emission and vibrations from a defective bearing. Mechanical Systems and Signal Processing, 25(1), 266–284. https://doi.org/10.1016/j.ymssp.2010.06.010

Han

Pei

Kamber

(2011). Data mining: concepts and techniques. Elsevier.

Haykin

(2010). Neural Networks and Learning Machines. Pearson Education.

Hindi Speech Database. Common voice. https://www.kaggle.com/datasets/mozillaorg/common-voice

Hinton

G. E.

Osindero

Teh

Y. W.

(2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527

Jassim

W. A.

Harte

(2022). Comparison of discrete transforms for deep-neural-networks-based speech enhancement. IET Signal Processing, 16(4), 438–448. https://doi.org/10.1049/sil2.12109

Zheng

Peng

(2021). Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms. EURASIP Journal on Audio, Speech, and Music Processing, 17, 1–15. https://doi.org/10.1186/s13636-021-00204-9

Llombart

Ribas

Miguel

Vicente

Ortega

Lleida

(2021). Progressive loss functions for speech enhancement with deep neural networks. EURASIP Journal on Audio, Speech, and Music Processing, 2021(1), 1–16. https://doi.org/10.1186/s13636-020-00191-3

10.

Michalak

Zdunek

Zimroz

Wyłomańska

(2024). Influence of α-stable noise on the effectiveness of non-negative matrix factorization—Simulations and real data analysis. Electronics, 13(5), 829. https://doi.org/10.3390/electronics13050829

11.

Natarajan

Al-Haddad

S. A. R.

Ahmad

F. A.

Kamil

Hassan

M. K.

Azrad

Macleans

J. F.

Abdulhussain

S. H.

Mahmmod

B. M.

Saparkhojayev

Dauitbayeva

(2025). Deep neural networks for speech enhancement and speech recognition: A systematic review. Ain Shams Engineering Journal, 16(7), 103405. https://doi.org/10.1016/j.asej.2025.103405

12.

Nossier

S. A.

Wall

Moniri

Glackin

Cannings

(2021). An experimental analysis of deep learning architectures for supervised speech enhancement. Electronics, 10(17), 1–32. https://doi.org/10.3390/electronics10010017

13.

Ramezani-Kebrya

Antonakopoulos

Cevher

Khisti

Liang

(2024). On the generalization of stochastic gradient descent with momentum. Journal of Machine Learning Research, 25(22), 1–56.

14.

Rix

A. W.

Beerends

J. G.

Hollier

M. P.

Hekstra

A. P.

(2001). Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. Proceedings of IEEE International Conference on, Acoustics, Speech, and Signal Processing, 2, 749–752. https://doi.org/10.1109/ICASSP.2001.941023

15.

Saleem

Khattak

M. I.

Al-Hasan

Qazi

A. B.

(2020). On learning spectral masking for single channel speech enhancement using feedforwar recurrent neural networks. IEEE Access, 8, 160581–160595. https://doi.org/10.1109/ACCESS.2020.3021061

16.

Sarker

I. H.

(2021). Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2(6), 420. https://doi.org/10.1007/s42979-021-00815-1

17.

Stochastic Gradient Descent with Momentum. https://in.mathworks.com/help/deeplearning/ref/nnet.cnn.trainingoptionssgdm.html

18.

Torcoli

(2019). An improved measure of musical noise based on spectral kurtosis. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY.

19.

Varga

Steeneken

H. J. M.

Jones

(1992). The NOISEX-92 study on the effect of additive noise on automatic speech recognition system. Reports of NATO Research Study Group (RSG.10).

20.

Wahab

F. E.

Saleem

Ullah

(2024). Compact deep neural networks for real-time speech enhancement on resource-limited devices. Speech Communication, 156, 103008. https://doi.org/10.1016/j.specom.2023.103008

21.

Wang

(2018). Spectral L2/L1 norm: A new perspective for spectral kurtosis for characterizing non-stationary signals. Mechanical Systems and Signal Processing, 104, 290–293. https://doi.org/10.1016/j.ymssp.2017.11.013

22.

Wang

N. Y.-H.

Wang

H.-L. S.

Wang

T.-W.

S.-W.

Wang

X. L. H.-M.

(2021). Improving the intelligibility of speech for simulated electric and acoustic stimulation using fully convolutional neural networks. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 184–195. https://doi.org/10.1109/TNSRE.2020.3042655

23.

Wang

(2020). Research progress in speech enhancement technology. 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), Chongqing, China (pp. 222–226).

24.

Dai

L.-R.

Lee

C.-H.

(2015). A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19. https://doi.org/10.1109/TASLP.2014.2364452

25.

Yuliani

A. R.

Faizal Amri

Suryawati

Ramdan

Pardede

H. F.

(2021). Speech enhancement using deep learning methods: A review. Jurnal Elektronika dan Telekomunikasi, 21(1), 19–26. https://doi.org/10.14203/jet.v21.19-26

26.

Zhang

Wanga

(2021). Deep ANC: A deep learning approach to active noise control. Neural Networks, 141, 1–10. https://doi.org/10.1016/j.neunet.2021.03.037