Sage Journals: Discover world-class research

Abstract

This paper presents a technique to detect the six affective states of individual using audio cues. Bi-spectral features extracted from entire speech signal and voiced part of speech are used to create feature vectors. For classification K-Nearest Neighbor (KNN) and Simple Logistic Classifiers (SL) are used. eNTERFACE audio-visual emotional speech corpus that consists of six archetypal affective states: Fear, Anger, Disgust, Sad, Happy, and Surprise is considered. The performance of the system is analyzed based on features obtained from voiced part of speech and features obtained from the entire speech signal. The work proposed is first of its kind in affect computation, where a compact 13-dimensional Bi-spectral features extracted from the voiced speech segments is able to yield promising performance. A considerable improvement of 8.46% – 27.6% recognition rate is achieved with the proposed methodology compared to the existing approaches using emotion samples from the same speech corpus adding novelty to the proposed work.

Keywords

Bi-spectral voiced speech affective state recognition

Get full access to this article

View all access options for this article.

References

http://canwetalk.ca/about-mental-illness/factors-affectingmental-health/

J.V.

Sloten ,

Verdonck ,

Nyssen and

Haueisen , Influence of mental stress on heart rate and heart rate variability, International Federation for Medical and Biological Engineering Proceedings, 2008, pp. 1366–1369.

Bakker ,

Pechenizkiy and

Sidorova , What's your current stress level? Detection of stress patterns from GSR sensor data, In Proceedings of ICDM, 2011, pp. 573–580.

C.H.

Wu ,

J.C.

Lin and

W.L.

Wei , Survey on audiovisual emotion recognition: Databases, features, and data fusion strategies, APSIPA Trans Sig Inf Process, 2014.

Lalitha ,

Mudupu ,

B.V.

Nandyala and

Munagala , Speech emotion recognition using DWT, in Proc Int Conf Comput Intell Comput Res Madurai, India, 2015, pp. 20–23.

Lalitha and

Tripathi , Emotion detection using perceptual based speech features, 2016 IEEE Annual India Conference (INDICON), Bangalore, 2016, pp. 1–5.

Sonia ,

S.D.

Peter and

Poulose Jacob , Performance of different classifiers in speech recognition, International Journal of Research in Engineering and Technology 2(4) (2013), 590–597.

Senthil Raja and

Dandapat , Speaker recognition under stressed condition, Int J Speech Technol 13 (2010), 141.

Hermansky and

Morgan , Rasta processing of speech, IEEE Transactions on Speech and Audio Processing 2(4) (1994), 578–589.

10.

Swain ,

Sahoo ,

Routray ,

Kabisatpathy and

J.N.

Kundu , Study offeature combination using HMM and SVM for multilingual Odiya speech emotion recognition, Int J Speech Technol 18(3) (2015), 1–7.

11.

Schuller ,

Valstar ,

Eyben ,

Cowie and

Pantic , AVEC 2012 - the continuous audio/visual emotion challenge, in Proc of Int Audio/Visual Emotion Challenge and Workshop (AVEC), ACM ICMI, 2012.

12.

Metallinou ,

Wollmer ,

Katsamanis ,

Eyben ,

Schuller and

Narayanan , Context-sensitive learning for enhanced audiovisual emotion classification, IEEE Trans Affective Comput (2012), 184–198.

13.

Eyben ,

Petridis ,

Schuller and

Pantic , Audiovisual vocal outburst classification in noisy acoustic conditions, ICASSP, 2012, pp. 5097–5100.

14.

Sayedelahl ,

Araujo and

M.S.

Kamel , Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversations, Int Conf Multimedia and Expo Workshops, 2013, pp. 1–6.

15.

V.P.

Rosas ,

Mihalcea and

L.P.

Morency , Multimodal sentiment analysis of Spanish online videos, IEEE Intell Syst, 2013, pp. 38–45.

16.

Martin ,

Kotsia ,

Macq and

Pitas , The eNTERFACE' 05 Audio-Visual Emotion database, 22nd International Conference on Data Engineering Workshops (ICDEW'06), Atlanta, GA, USA, 2006, pp. 8–8.

17.

Bachu ,

Kopparthi ,

Adapa and

Barkana , Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy, Elleithy K. (eds), Advanced Techniques in Computing Sciences and Software Engineering, Springer, Dordrecht, 2010.

18.

Muthuswamy ,

D.L.

Sherman and

N.V.

Thakor , Higherorder spectral analysis of burst patterns in EEG, Biomedical Engineering, IEEE Transactions on 46 (1999), 92–99.

19.

T.-T.

Ng ,

S.-F.

Chang and

Sun , Blind detection of photomontage using higher order statistics, International Symposium on Circuits and Systems, 2004, IEEE, Vol. 685, 2004, pp. V688–V-691.

20.

Du ,

Dua ,

R.U.

Acharya and

C.K.

Chua , Classification of epilepsy using high-order spectra features and principle http://canwetalk.ca/about-mental-illness/factors-affecting-mental-health/component analysis, Journal of Medical Systems 36 (2012), 1731–1743.

21.

R.K.

Gowda ,

Nimbalker ,

Lavanya ,

Lalitha and

Tripathi , Affective computing using speech processing for call centre applications, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, 2017, pp. 766–771.

22.

Sahoo and

Routray , Emotion recognition from audiovisual data using rule based decision level fusion, 2016 IEEE Students' Technology Symposium (TechSym), Kharagpur, 2016, pp. 7–12.

23.

Fayek ,

Lech and

Cavedon , Towards Real-time Speech Emotion Recognition using Deep Neural Networks, in ICSPCS, Cairns, Australia, 2015, pp. 1–6.

24.

Yan ,

Zheng ,

Xu ,

Lu ,

Li and

Wang , Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech, IEEE Transactions on Multimedia 18(7) (2016), 1319–1329.

25.

Zhalehpour ,

Onder ,

Akhtar and

ErogluErdem , BAUM-1: A spontaneous audio-visual face database of affective and mental states, IEEE Transactions on Affective Computing 8(3) (2017), 300–313.

26.

Struc ,

Mihelic , et al., Multi-modal emotion recognition using canonical correlations and acoustic features, Pattern Recognition (ICPR), International Conference on IEEE, 2010, pp. 4133–4136.

Affective state recognition using audio cues

Abstract

Keywords

Get full access to this article

References