Complementing the DTW based speaker verification systems with knowledge of specific regions of interest

Abstract

In recent times, Dynamic Time Warping (DTW) based template matching systems have again come to the forefront in the field of text-dependent speaker verification. Its integration with the latest technology, like i-vector/Probabilistic Linear Discriminant Analysis (PLDA) and Deep Neural Network (DNN), has resulted in significant improvement in the performance of the systems. DTW algorithm time-aligns two templates and gives a similarity score based on the optimal warping path. It however weighs all the local distances equally, along the optimal path. In this paper, we propose complementing the DTW based text-dependent speaker verification systems with local scores derived from the vicinity of speaker-identity-rich regions. The vowel regions are used to determine portions along the warping path that are more important in terms of speaker discriminating information content. Two systems, namely the DTW/ Mel-frequency Cepstral Coefficients (MFCC) system and the online i-vector/PLDA/DTW system have been extended to incorporate the knowledge of specific regions of interest. The results have been evaluated on Part 1 of RSR2015 database. Relative improvements of upto 11.85% and 49.41% are observed for the extended systems based on MFCC and i-vector respectively.

Keywords

DTW vowel regions online i-vector text-dependent speaker verification

Get full access to this article

View all access options for this article.

References

J.H.

Hansen and

Hasan , Speaker recognition by machines and humans: A tutorial review, IEEE Signal Processing Magazine 32 (2015), 74–99.

D.A.

Reynolds ,

T.F.

Quatieri and

R.B.

Dunn , Speaker verification using adapted Gaussian mixture models, Digital Signal Processing 10 (2000), 19–41.

J.P.

Campbell , Speaker recognition: A tutorial, Proceedings of the IEEE 85 (1997), 1437–1462.

D.A.

Reynolds , An overview of automatic speaker recognition technology, In Acoustics, Speech, and Signal Processing (ICASSP) 4 (2002), IV–4072.

Larcher ,

K.A.

Lee ,

Ma and

Li , Text-dependent speaker verification: Classifiers, databases and RSR2015, Speech Communication 60 (2014), 56–77.

Stafylakis ,

Kenny ,

Ouellet ,

Perez ,

Kockmann and

Dumouchel , Text-dependent speaker recognition using PLDA with uncertainty propagation, Matrix 500 (2013).

Hanilçi and

Çeliktaş , Turkish text-dependent speaker verification using i-vector/PLDA approach, In 26th Signal Processing and Communications Applications Conference (SIU) IEEE, 2018.

Kenny ,

Stafylakis ,

Alam ,

Ouellet and

Kockmann , Joint factor analysis for text-dependent speaker verification, In Proc Odyssey Workshop, 2014, pp. 1–8.

Aronowitz , Text dependent speaker verification using a small development set, In Odyssey 2012-The Speaker and Language Recognition Workshop, 2012.

10.

Liu ,

Qian ,

Chen ,

Fu ,

Zhang and

Yu , Deep feature for text-dependent speaker verification, Speech Communication 73 (2015), 1–13.

11.

Heigold ,

Moreno ,

Bengio and

Shazeer , End-to-end text-dependent speaker verification, In Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 5115–5119.

12.

Lei ,

Scheffer ,

Ferrer and

McLaren , A novel scheme for speaker recognition using a phonetically-aware deep neural network, In Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 1695–1699.

13.

Variani ,

Lei ,

McDermott ,

I.L.

Moreno and

Gonzalez-Dominguez , Deep neural networks for small footprint text-dependent speaker verification, In Acoustics, Speech and Signal Processing (ICASSP), 2014, pp. 4052–4056.

14.

Dey ,

Koshinaka ,

Motlicek and

Madikeri , DNN based speaker embedding using content information for text-dependent speaker verification, In Acoustics, Speech and Signal Processing (ICASSP), 2018.

15.

Chen ,

Zhao ,

S.X.

Zhang ,

Li ,

Ye and

Soong , Exploring sequential characteristics in speaker bottleneck feature for text-dependent speaker verification, In Acoustics, Speech and Signal Processing (ICASSP), 2017.

16.

Dey ,

Motlicek ,

Madikeri and

Ferras , Exploiting sequence information for text-dependent speaker verification, In Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 5370–5374.

17.

Dey ,

Motlicek ,

Madikeri and

Ferras , Template-matching for text-dependent speaker verification, Speech Communication 88 (2017), 96–105.

18.

S.M.

Prasanna and

Pradhan , Significance of vowel-like regions for speaker verification under degraded conditions, IEEE Transactions on Audio, Speech, and Language Processing 19 (2011), 2552–2565.

19.

S.B.

Davis and

Mermelstein , Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, In Readings in Speech Recognition, 1990. pp. 65–74.

20.

D.J.

Berndt and

Clifford , Using dynamic time warping to find patterns in time series, In KDD Workshop 10 (1994), 359–370.

21.

Das and

V.P.

Kumar , Text-dependentspeaker-recognition using one-pass dynamic programming algorithm, In Acoustics, Speech and Signal Processing 1 (2006), I–I.

22.

S.J.

Young and

Young , The HTK hidden Markov model toolkit: Design and Philosophy University of Cambridge, Department of Engineering, 1993, p. 28.

23.

Buyuk , Telephone-based text-dependent speaker verification, PhD thesis, 2011.

24.

S.O.

Sadjadi ,

Slaney and

Heck , MSR identity toolbox v1. 0: A MATLAB toolbox for speaker-recognition research, Speech and Language Processing Technical Com-mitteeNewsletter 1 (2013), 1–32.