Abstract
In recent times, Dynamic Time Warping (DTW) based template matching systems have again come to the forefront in the field of text-dependent speaker verification. Its integration with the latest technology, like i-vector/Probabilistic Linear Discriminant Analysis (PLDA) and Deep Neural Network (DNN), has resulted in significant improvement in the performance of the systems. DTW algorithm time-aligns two templates and gives a similarity score based on the optimal warping path. It however weighs all the local distances equally, along the optimal path. In this paper, we propose complementing the DTW based text-dependent speaker verification systems with local scores derived from the vicinity of speaker-identity-rich regions. The vowel regions are used to determine portions along the warping path that are more important in terms of speaker discriminating information content. Two systems, namely the DTW/ Mel-frequency Cepstral Coefficients (MFCC) system and the online i-vector/PLDA/DTW system have been extended to incorporate the knowledge of specific regions of interest. The results have been evaluated on Part 1 of RSR2015 database. Relative improvements of upto 11.85% and 49.41% are observed for the extended systems based on MFCC and i-vector respectively.
Get full access to this article
View all access options for this article.
