Random fourier feature based music-speech classification

Abstract

The present paper proposes Random Kitchen Sink based music/speech classification. The temporal and spectral features such as spectral centroid, Spectral roll-off, spectral flux, Mel-frequency cepstral coefficients, entropy, and Zero-crossing rate are extracted from the signals. In order to show the competence of the proposed approach, experimental evaluations and comparisons are performed. Even though both speech and music signals differ in their production mechanisms, those share many common characteristics such as a common spectrum of frequency and are comparatively non-stationary which makes the classification difficult. The proposed approach explicitly maps the data to a feature space where it is linearly separable. The evaluation results shows that the proposed approach provides competing scores with the methods in the available literature.

Keywords

Music/speech random kitchen sink feature vector GTZAN database S&S database spectral features

Get full access to this article

View all access options for this article.

References

Lavner

and Ruinskiy

, A decision-tree-based algorithm for speech/music classification and segmentation, EURASIP Journal on Audio, Speech, and Music Processing2009(1) (2009), 239892.

Shirazi

and Ghaemmaghami

, Improvement to speech-music discrimination using sinusoidal model based features, Multimedia Tools and Applications50(2) (2010), 415–435.

Mezghani

, Charfeddine

, Amar

C.B.

and Nicolas

, Multifeature speech/music discrimination based on mid-term level statistics and supervised classifiers, In 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), 2016, pp. 1–8, IEEE.

Tzanetakis

and Cook

, Musical genre classification of audio signals, IEEE Transactions on speech and audio processing10(5) (2002), 293–302.

Khonglah

B.K.

, Sharma

and Prasanna

S.M.

, Speech vs music discrimination using empirical mode decomposition. In 2015 Twenty First National Conference on Communications (NCC), 2015, pp. 1–6, IEEE.

Saunders

, Real-time discrimination of broadcast speech/music, IEEE ICASSP2 (1996), 993–996.

Scheirer

and Slaney

, Construction and evaluation of a robust multifeature speech/music discriminator, IEEE ICASSP2 (1997), 1331–1334.

Alexandre-Cortizo

, Rosa-Zurera

and Lopez-Ferreras

, Application of fisher linear discriminant analysis to speech/music classification, In, EUROCON ICCT2 (2005), 1666–1669.

Williams

and Ellis

D.P.

, Speech/music discrimination based on posterior probability features, In Sixth European Conference on Speech Communication and Technology, 1999.

10.

El-Maleh

, Klein

, Petrucci

and Kabal

, Speech/music discrimination for multimedia applications, IEEE ICASSP4 (2000), 2445–2448, IEEE.

11.

Pikrakis

, Giannakopoulos

and Theodoridis

, A speech/music discriminator of radio recordings based on dynamic programming and bayesian networks, IEEE Transactions on Multimedia10(5) (2008), 846–857.

12.

Lee

C.H.

, Shih

J.L.

, Yu

K.M.

and Lin

H.S.

, Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features, IEEE Transactions on Multimedia11(4) (2009), 670–682.

13.

Sell

and Clark

, Music tonality features for speech/music discrimination, ICASSP, 2014, pp. 2489–2493, IEEE.

14.

Pikrakis

and Theodoridis

, Speech-music discrimination: A deep learning perspective, EUSIP, 2014, pp. 616–620, IEEE.

15.

Neammalai

, Phimoltares

and Lursinsap

, Speech and music classification using hybrid form of spectrogram and fourier transformation. In Signal and Information Processing Association Annual Summit and Conference, 2014, pp. 1–6, IEEE.

16.

Khonglah

B.K.

and Prasanna

S.M.

, Speech/music classification using speech-specific features, Digital Signal Processing48 (2016), 71–83.

17.

Bhattacharjee

, Prasanna

S.R.M.

and Guha

, Time-Frequency Audio Features for Speech-Music Classification, arXiv preprint arXiv:1811.01222, 2018.

18.

Baghel

, Khonglah

B.K.

, Prasanna

S.M.

and Guha

, Shouted/normal speech classification using speech-specific features, TENCON, 2016, pp. 1655–1659, IEEE.

19.

Tsipas

, Vrysis

, Dimoulas

and Papanikolaou

, Efficient audio-driven multimedia indexing through similarity-based speech/music discrimination, Multimedia Tools and Applications76(24) (2017), 25603–25621.

20.

Rahimi

and Recht

, Uniform approximation of functions with random bases. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing 2008, pp. 555–561, IEEE.

21.

Rahimi

and Recht

, Random features for large-scale kernel machines. In Advances in neural information processing systems, 2008, pp. 1177–1184.

22.

Scholkopf

and Smola

A.J.

, Learning with kernels: support vector machines, regularization, optimization, and beyond, MIT press, 2001.

23.

Hofmann

, Schölkopf

and Smola

A.J.

, Kernel methods in machine learning, The annals of statistics, 2008, 1171–1220.

24.

Kumar

S.S.

, Premjith

, Kumar

M.A.

and Soman

K.P.

, AMRITA CEN-NLP@ SAIL2015: sentiment analysis in Indian Language using regularized least square approach with randomized feature learning, In International Conference on Mining Intelligence and Knowledge Exploration, 2015, pp. 671–683, Springer, Cham.

25.

Athira

, Harikumar

, Sowmya

and Soman

Dr. K.P.

, Parameter analysis of random kitchen sink algorithm, International Journal of Applied Engineering Research10(20) (2015), 19351–19355.

26.

Pedregosa

, Varoquaux

, Gramfort

, Michel

, Thirion

, Grisel

, . . . and Vanderplas

, Scikit-learn: Machine learning in Python, Journal of machine learning research12(Oct) (2011), 2825–2830.

27.

https://towardsdatascience.com/feature-selection-using-random-forest-26d7b747597f (Accessed on February 2019).