Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering

Abstract

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.

Keywords

Speech enhancement bilateral filtering optimally modified log-spectral amplitude bilateral spectrogram filtering spectrogram

Get full access to this article

View all access options for this article.

References

Loizou

P.C.

, Speech enhancement: theory and practice, (second edition), CRC Press, Boca Raton, FL, USA, (2017).

Benesty

and Cohen

, Single-channel speech enhancement in the time domain, Canonical Correlation Analysis in Speech Enhancement, Springer, Cham, (2018).

Upadhyay

and Jaiswal

R.K.

, Single channel speech enhancement: Using Weiner filtering recursive noise estimation, Procedia Computer Science (2016).

Benesty

, Introduction, Fundamentals of Speech Enhancement, Springer, Berlin, Germany, (2018).

Loizou

P.C.

and Kim

, Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions, IEEE Transactions on Acoustics Speech and Signal Processing 19(1) (2011), 47–56.

, Yang

, Zhang

, Yan

, Hi

, Akagi

and Loizou

P.C.

, Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English, The Journal of the Acoustical Society of America 129(5) (2011), 3291–3301.

Jabloun

and Champangne

, Incorporating the human hearing properties in the signal subspace approach for speech enhancement, IEEE Transactions on Acoustics Speech and Signal Processing 11(6) (2003), 700–708.

Zheng

, Zhou

and Li

, A modified a priori SNR estimator based on the united speech presence probabilities, Journal of Electronics & Information Technology 30(7) (2008), 1680–1683.

Ephraim

and Malah

, Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing 32(6) (1984), 1109–1121.

10.

Ephraim

and Malah

, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing 33(2) (1985), 443–445.

11.

Cohen

, Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator, IEEE Signal Processing Letters 9(4) (2002), 113–116.

12.

Tomasi

and Manduchi

, Bilateral filtering for gray and color images, Proc. IEEE International Conference on Computer Vision (ICCV), Bombay, India, (1998), 839–846.

13.

Knaus

and Zwicker

, Progressive image denoising, IEEE transactions on image processing 23(7) (2014), 3114–3125.

14.

Chaudhury

K.N.

and Rithwik

, Image denoising using optimally weighted bilateral filters: A sure and fast approach, Proc. IEEE International Conference on Image Processing (ICIP), (2015), 108–112.

15.

Chaudhury

K.N.

, Sage

and Unser

, Fast O(1) bilateral filtering using trigonometric range kernels, IEEE Transactions on Image Processing 20(12) (2011), 3376–3382.

16.

Wan

E.A.

and Nelson

A.T.

, Networks for speech enhancement, in Handbook of Neural Networks for Speech Processing, S. Katagiri, Ed. Norwell, MA, USA: Artech House, (1998).

17.

Buades

, Coll

and Morel

J.M.

, A non-local algorithm for image denoising, Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 2 (2005), 60–65.

18.

Cohen

, Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Transactions on speech and audio processing 11(5) (2003), 466–475.

19.

, Sun

, Zhang

and Li

, Direction-aware neural style transfer with texture enhancement, Neurocomputing 370 (2019), 39–55.

20.

, Shi

, Lu

and Wang

, Quantum circuit design for several morphological image processing methods, Quantum Information Processing 18(12) (2019), 364.

21.

Buades

, Coll

and Morel

J.M.

, A review of image denoising algorithms, with a new one, Multiscale Modeling & Simulation 4(2) (2005), 490–530.

22.

Yang

, Yang

, Davis

and Nister

, Spatial-depth super resolution for range images, Proc. IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, (2007), 1–8.

23.

B.M.

, Chen

, Dorsey

and Durand

, Image-based modeling and photo editing, Proc. 28th annual conference on Computer graphics and interactive techniques, Los Angeles, CA, USA, (2001), 433–442.

24.

Durand

and Dorsey

, Fast bilateral filtering for the display of high-dynamic-range images, Proc. 29th annual conference on Computer graphics and interactive techniques, San Antonio, Texas, USA, (2002), 257–266.

25.

Ramanath

and Snyder

W.E.

, Adaptive demosaicking, Journal of Electronic Imaging 12(4) (2003), 633–643.

26.

Winnemöller

, Olsen

S.C

and Gooch

, Real-time video abstraction, ACM Transactions On Graphics (TOG) 25(3) (2006), 1221–1226.

27.

Xiao

, Cheng

, Sawhney

, Rao

and Isnardi

, Bilateral filtering-based optical flow estimation with occlusion detection, Proc. European conference on computer vision, Springer, Berlin, Heidelberg, (2006), 211–224.

28.

Yang

, Wang

, Yang

, Stewénius

and Nistér

, Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling, IEEE Transactions on Pattern Analysis and Machine Intelligence 31(3) (2008), 492–504.

29.

Zheng

, Deleforge

, Li

and Kellermann

, Statistical analysis of the multichannel Wiener filter using a bivariate normal distribution for sample covariance matrices, IEEE/ACM Transactions on Audio, Speech and Language Processing 26(5) (2018), 951–966.

30.

Chaudhury

K.N.

and Dabhade

S.D.

, Fast and provably accurate bilateral filtering, IEEE Transactions on Image Processing 25(6) (2016), 2519–2528.

31.

Muller

J.M.

, Elementary functions: Algorithms and implementation, Birkhauser Boston, (2006).

32.

Hao

, Pan

, Guo

, Hong

and Wang

, Image detail enhancement with spatially guided filters, Signal Processing 120 (2016), 789–796.

33.

Varg

and Steeneken

H.J.M.

, Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication 31(2) (2014), 11–20.

34.

Garofolo

J.S.

, Getting started with the DARPA TIMIT CD-ROM: an acoustic phonetic continuous speech database, Gaithersburg, MD, Nat Inst of Standards and Technology (NIST), (1993).

35.

Wang

, Yang

, Yan

, Huang

and Sang

, Speech Enhancement Algorithm of Binary Mask Estimation Based on a Priori SNR Constraints, Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), (2018), 937–943.

36.

Hansen

J.H.L.

and Pellom

B.L.

, An effective quality evaluation protocol for speech enhancement algorithms, Proc. International Conference on Spoken Language Processing (ICSLP), Sydney, Australia, (1998), 1–4.

37.

Peng

, Tan

Z.H.

, Li

and Zheng

, A perceptually motivated LP residual estimator in noisy and reverberant environments, Speech Communication 96 (2018), 129–141.

38.

Rix

A.W.

, Beerends

J.G.

, Hollier

M.P.

and Hekstra

A.P.

, Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Salt Lake, USA, (2001), 749–752.

39.

Wang

, Liu

, Zheng

and Li

, Spectral subtraction based on two-stage spectral estimation and modified cepstrum thresholding, Applied Acoustics 74(3) (2013), 450–458.

40.

and Loizou

P.C.

, Subjective comparison and evaluation of speech enhancement algorithms, Speech communication 49(7-8) (2007), 588–601.