Speaker recognition using temporal information and session variability compensation in a binary framework

Abstract

In recent years a simple representation of a speech excerpt has been proposed, as a binary matrix allowing easy access to the speaker discriminant information. In addition to the time-related abilities of this representation, it also allows the system to work with a temporal information representation based on sequential changes present in the binary representation. A new temporal information is proposed in order to add it to speaker recognition systems. A new specificity selection approach using a mask in the cumulative vector space is also proposed. Furthermore in this space, temporal information can be exploited to compensate for the effects of session variability. A new variability compensation method in the temporal space is proposed in order to remove the unwanted attributes of session variability and the common attributes among speakers. This aims to increase effectiveness in the speaker binary key paradigm. The experimental validation, done on the NIST-SRE framework, demonstrates the efficiency of the proposed solutions, which shows an EER improvement of 9%. The combination of i-vector and binary approaches, using the proposed methods, showed the complementarity of the discriminatory information exploited by each of them.

Keywords

Apeaker recognition binary key representation accumulative vector temporal information

Get full access to this article

View all access options for this article.

References

Reynolds

D.A.

, Quatieri

T.F.

and Dunn

R.B.

, Speaker verification using adapted gaussian mixture models, Digital Signal Processing10(1-3) (2000), 19-41.

Campbell

W.M.

, Campbell

J.P.

, Reynolds

D.A.

, Singer

and Torres-Carrasquillo

P.A.

, Support vector machines for speaker and language recognition, Computer Speech and Language20(2-3) (2006), 210-229.

Kenny

, Boulianne

, Ouellet

and Dumouchel

, Speaker and session variability in gmm-based speaker verification, IEEE Transactions on Audio, Speech and Language Processing15(4) (2007), 1448-1460.

Dehak

, Kenny

, Dehak

, Dumouchel

and Ouellet

, Front-end factor analysis for speaker verification, IEEE Transactions on Audio, Speech and Language Processing19(4) (2011), 788-798.

Roy

, Magimai-Doss

and Marcel

, A fast parts-based approach to speaker verification using boosted slice classifiers, IEEE Transactions on Information Forensics and Security7(1) (2012), 241-254.

Anguera

and Bonastre

J.F.

, A novel speaker binary key derived from anchor models, in: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, (26-30 Sep 2010), 2118-2121.

Bonastre

J.F.

, Bousquet

P.M.

, Matrouf

and Anguera

, Discriminant binary data representation for speaker recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague Congress Center, Prague, Czech Republic, (22-27 May 2011), 5284-5287.

Hernández-Sierra

, Bonastre

J.F.

and Calvo

J.R.

, Speaker recognition using a binary representation and specificities models, in: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 17th Iberoamerican Congress, CIARP 2012, Buenos Aires, Argentina, Proceedings, (3-6 Sep 2012), 732-739.

Bonastre

J.F.

, Anguera

, Hernández-Sierra

and Bousquet

P.M.

, Speaker modeling using local binary decisions, in: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, (27-31 Aug 2011), 13-16.

10.

Hernández-Sierra

, Calvo

J.R.

and Bonastre

J.F.

, Temporal information in a binary framework for speaker recognition, in: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - 19th Iberoamerican Congress, CIARP, Puerto Vallarta, Mexico, Proceedings, (2-5 Nov 2014), 207-213.

11.

Rao

C.R.

, The utilization of multiple measurements in problems of biological classification, Journal of the Royal Statistical Society - Series B10(2) (1948), 159-203.

12.

Hatch

A.O.

, Kajarekar

S.S.

and Stolcke

, Within-class covariance normalization for svm-based speaker recognition, in: INTERSPEECH 2006 - ICSLP, Ninth International Conference on Spoken Language Processing, Pittsburgh, PA, USA, (17-21 Sep 2006).

13.

Solomonoff

, Quillen

and Campbell

W.M.

, Channel compensation for SVM speaker recognition, in: ODYSSEY 2004 - The Speaker and Language Recognition Workshop, Toledo, Spain, (31 May-3 June 2004), 57-62.

14.

Hernández-Sierra

, Calvo

J.R.

, Bonastre

, Bousquet

, Session compensation using binary speech representation for speaker recognition, Pattern Recognition Letters49 (2014), 17-23.

15.

Martin

A.F.

, Doddington

G.R.

, Kamm

, Ordowski

and Przybocki

M.A.

, The DET curve in assessment of detection task performance, in: Fifth European Conference on Speech Communication and Technology, EUROSPEECH 1997, Rhodes, Greece, (22-25 Sep 1997).

16.

Miró

X.A.

and Bonastre

, Fast speaker diarization based on binary keys, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague Congress Center, Prague, Czech Republic, (22-27 May 2011), 4428-4431.

17.

Delgado

, Anguera

, Fredouille

and Serrano

, Novel clustering selection criterion for fast binary key speaker diarization, in: Proc INTERSPEECH, (2015).