Cascaded Feedforward Neural Networks for speaker identification using Perceptual Wavelet based Cepstral Coefficients

Abstract

The rapid development in technology has led to a colossal surge in the use of biometric authentication system. Speaker identification biometric is one of the fields that is under progress and demands more and more precision. The objective of this research is to explore the issue of identifying a speaker from voice regardless of the content. Perceptual Wavelet Packet Transform (PWPT) and Artificial Neural Networks (ANN) approach are discussed in this paper for speaker identification. Perceptual Wavelet Packet Cepstral Coefficients (PWPCC) are used for transforming speech into spectral feature vectors, and the most germane aspects of the speech signal are selected from the energy and variance distribution characteristics. These selected attributes are presented to the Cascaded Feedforward Neural Network (CFNN) and trained with Levenberg-Marquardt Back Propagation (LMBP) algorithm for further classification. The performance of the network is determined by evaluating the Speaker Identification Rate (SIR). For comparison, five different gradient descent training algorithms are considered and it is found that the LMBP produces better performance. The proposed model is evaluated for clean as well as noisy speech at various SNR levels and is found to be competitive, and the experimental results show significant improvement in speaker identification rate compared with other classical methods.

Keywords

Perception wavelet speaker speech neural network

Get full access to this article

View all access options for this article.

References

Almaadeed

, Aggoun

and Amira

, Text-independent speaker identification using vowel formants, Journal of Signal Processing Systems 82(3) (2016), 345–356.

Naik

and Jayant , Speaker Verification: A Tutorial, Communications Magazine, IEEE (1990), pp. 42–48.

Dhonde

S.B.

and Jagade

S.M.

, Significance of Frequency Band Selection of MFCC for Text-Independent Speaker Identification, Singapore, Springer, pp, Pro International Conference on Data Engineering and Communication Technology (2017), 217–224.

Sahu

P.K.

, Biswas

, Bhowmick

and Chandra

, Auditory ERB like admissible wavelet packet features for TIMIT phoneme recognition, Engineering Science and Technology, an International Journal 17(3) (2014), 145–151.

Chetouani

, Faundez-Zanuy

, Gas

and Zarader

J.L.

, Investigation on LP-residual representations for speaker identification, Pattern Recognition 42(3) (2009), 487–494.

Alsteris

L.D.

and Paliwal

K.K.

, ASR on Speech Reconstructed from Short-time Fourier Phase Spectra, Proc of Int Conference on Spoken Language Processing 2004.

Huang

, Xu

, Zhou

and Yan

, Feature recovery for noise-robust speaker verification,–IET Journals & Magazines, Electronics Letters 51(18) (1461).

, Liu

, Cai

and Liu

, Generalized I-vector representation with phonetic tokenizations and tandem features for both text independent and text dependent speaker verification, Journal of Signal Processing Systems 82(2) (2016), 207–215.

Lei

and Kun

, Speaker recognition using wavelet cepstral coefficient, I-vector, and cosine distance scoring and its application for forensics, Journal of Electrical and Computer Engineering (2016).

10.

Chauhan

P.M.

and Desai

N.P.

, Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter, Coimbatore, India, Proc IEEE Int Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) 2014.

11.

and Bourlard

, Sub-Band Based Log-Energy and its Dynamic Range Stretching for Robust In-Car Speech Recognition, Portland, Oregon, Proc Int Conference Speech Communication Association 2012.

12.

Dhonde

S.B.

, Chaudhari

and Jagade

S.M.

, Integration of Mel-frequency Cepstral Coefficients with Log Energy and Temporal Derivatives for Text-Independent Speaker Identification, Singapore, Springer, Volume 1, pp, Proc Int Conference on Data Engineering and Communication Technology: ICDECT 2016 791–797.

13.

Almaadeed

, Aggoun

and Amira

, Speaker identification using multimodal neural networks and wavelet analysis, IET Biometrics, IET Journals & Magazines 4(1) (2015), 18–28.

14.

Barbosa

F.G.

and Silva

W.L.S.

, Support Vector Machines, Mel-Frequency Cepstral Coefficients and the Discrete Cosine Transform Applied on Voice Based Biometric Authentication, Proc Int Conference SAI Intelligent Systems Conference, London, 2015.

15.

Rathor

and Jadon

R.S.

, Text Independent Speaker Recognition Using Wavelet Cepstral Coefficient and Butter Filter, India, pp, Proc Of IEEE Int Conference on Computing Communication and Networking Technologies (ICCCNT) (2017), 1–5

Delhi

Worth.

16.

Khanchandani

K.B.

and Hussain

M.A.

, Emotion recognition using multilayer perceptron and generalized feed forward neural network, Journal of Scientific and Industrial Research (JSIR) 68(05) (2009), 367–371.

17.

Daqrouq

and Tutunji

T.A.

, Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers, Applied Soft Computing 27 (2015), 231–239.

18.

Jensen

and Tan

Z.-H.

, Minimum mean-square error estimation of mel-frequency cepstral features–a theoretically consistent approach, IEEE/ACM Transactions on Audio, Speech, and Language Processing 23(1) (2015), 186–197.

19.

Yadav

S.S.

and Bhalke

D.G.

, Speaker identification system using wavelet transform and VQ modeling technique, International Journal of Computer Applications 112(9) (2015), 75–88.

20.

Biswas

, Sahu

P.K.

, Bhowmick

and Chandra

, Articulation based admissible wavelet packet feature based on human cochlear frequency response for TIMIT speech recognition, Ain Shams Engineering Journal 5(4) (2014), 1189–1198.

21.

Pavez

and Silva

J.F.

, Analysis and design of Wavelet-Packet Cepstral coefficients for automatic speech recognition, Speech Communication 54(6) (2012), 814–835.

22.

Pradeep Kumar

, Rao

, A Study of Frequency-Scale Warping for Speaker Recognition, Proc of National Conference on Communications, NCC 2004, IISc Bangalore, 2004.

23.

Biswas

, Sahu

P.K.

, Bhowmick

and Chandra

, Admissible wavelet packet subband-based harmonic energy features for Hindi phoneme recognition, IET Signal Processing 9(8) (2015).

24.

Squartini

, Principi

, Rotili

and Piazza

, Environmental robust speech and speaker recognition through multi-channel histogram equalization, Neurocomputing 78(1) (2012), 111–120.

25.

de Jesús

R.J.

, Edwin

, Meda-Campaña

Jesús A.

, Páramo

L.A.

, Francisco

N.J.

and Jaime

, Neural network updating via argument Kalman filter for modeling of Takagi-Sugeno fuzzy models, Journal of Intelligent & Fuzzy Systems 35(2) (2018), 2585–2596.

26.

Xiao-Li

, Fu-Gui

and Jen-Chih

, An inequality approach for evaluating decision making units with a fuzzy output, Journal of Intelligent & Fuzzy Systems 34(1) (2018), 459–465.

27.

de JesÚs

, Rubio, SOFMLS: Online self-organizing fuzzy modified least-squares network, IEEE Transactions on Fuzzy Systems 17(6) (2009), 1296–1309.

28.

Zhang

X.-M.

and Han

Q.-L.

, State estimation for static neural networks with time-varying delays based on an improved reciprocally convex inequality, IEEE Transactions on Neural Networks and Learning Systems 29(4) (2018), 1376–1381.

29.

de Jesús

, Rubio, A method with neural networks for the classification of fruits and vegetables, Soft Computing 21(23) (2017), 7207–7220.

30.

Jiang

, Liang

, Feng

, Fan

, Pei

, Xue

and Guan

, Text classification based on deep belief network and softmax regression, Neural Computing and Applications 29(1) (2018), 61–70.

31.

Makrem

B.J.

, Imen

and Kaïs

, Study of speaker recognition system based on Feed Forward deep neural networks exploring text-dependent mode, Sciences of Electronics, Technologies of Information and Telecommunications SETIT 2016.

32.

Mueen

, Ahmed

, Sanaullah

and Gaba , Speaker recognition using artificial neural networks, Students Conference IEEE, ISCON ’02, 2002.

33.

Le-Qing

, Insect Sound Recognition Based on MFCC and PNN, Pro International Conference on Multimedia and Signal Processing (CMSP)2011.

34.

Jayasree

, Devaraj

and Sukanesh

, Power quality disturbance classification using Hilbert transform and RBF networks, Neurocomputing 73(7–9) (2010), 1451–1456.

35.

Maazouzi

A.-E.

, Aqili

, Raji

and Hammouch

, A speaker recognition system using power spectrum density and similarity measurements, IEEE Third World Conference on Complex Systems (WCCS), Marrakech, Morocco, 2015.

36.

Shoumy

N.J.

, Yaakob

S.N.

, Ehkan

, Ali

Md.S.

and Khatun

, Cascade-forward neural network performance study for bloodstain image analysis, IEEE Int Conference Electronic Design (ICED), Thailand, IEEE, 2016.

37.

Kua

J.M.K.

, Thiruvaran

, Ambikairajah

M.N.E.

and Epps

, Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition, The Speaker and Language Recognition Workshop, Brno, Czech Republic, 2010.

38.

Nawi

N.M.

, Khan

and Rehman

M.Z.

, CSLM: Levenberg marquardt based back propagation algorithm optimized with cuckoo search, Journal of ICT Research and Applications 7(2) (2013).

39.

Furui

S.J.

, Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology 2005.

40.

Caon

D.R.S.

, Amehraye

, Razik

, Chollet

, Andreao

R.V.

and Mokbel

, Experiments on Acoustic Model supervised adaptation and evaluation by K-Fold Cross Validation technique, Proc International Symposium on I/V Communications and Mobile Network, 2010.

41.

Renisha

G.P.

, Karpagavalli

K.P.

and Krishnaveni , Classification of gender based on voice using support vector machine, International Journal of Advanced Research Trends in Engineering and Technology (IJARTET) 3(4) (2016).

42.

Lalitha

, Ashwini

, Madhusudhan

K.N.

and Sachin

B.S.

, Person authentication using face and voice modalities, International Journal of Advances in Science Engineering and Technology 1(2) (2013).